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ABSTRACT 

The importance of utilizing multisource data in ground-cover classifica- 
tion lies in the fact that improvements in classification accuracy can be achieved 
at the expense of additional independent features provided by separate 
sensors. However, it should be recognized that information and knowledge 
from most available data sources in the real world are neither certain nor 
complete. We refer to such a body of uncertain, incomplete, and sometimes 
inconsistent information as “evidential information." 

The objective of this research is to develop a mathematical framework 
within which various applications can be made with multisource data in remote 
sensing and geographic information systems. The methodology described in 
this report has evolved from “evidential reasoning,” where each data source is 
considered as providing a body of evidence with a certain degree of belief. The 
degrees of belief based on the body of evidence are represented by “interval- 
valued (IV) probabilities” rather than by conventional point-valued probabilities 
so that uncertainty can be embedded in the measures. 

There are three fundamental problems in the multisource data analysis 
based on IV probabilities: (1) how to represent bodies of evidence by IV 
probabilities, (2) how to combine IV probabilities to give an overall assessment 
of the combined body of evidence, and (3) how to make a decision when the 
statistical evidence is given by IV probabilities. 

This report first introduces an axiomatic approach to IV probabilities, 
where the IV probability is defined by a pair of set-theoretic functions which 
satisfy some pre-specified axioms. On the basis of this approach the report 
focuses on representation of statistical evidence by IV probabilities and 
combination of multiple bodies of evidence. 

Although IV probabilities provide an innovative means for the 
representation and combination of evidential information, they make the 
decision process rather complicated. It entails more intelligent strategies for 


making decisions. This report also focuses on the development of decision 
rules over IV probabilities from the viewpoint of statistical pattern recognition. 

The proposed method, so called “evidential reasoning” method, is 
applied to the ground-cover classification of a multisource data set consisting of 
Multispectral Scanner (MSS) data, Synthetic Aperture Radar (SAR) data, and 
digital terrain data such as elevation, slope, and aspect. By treating the data 
sources separately, the method is able to capture both parametric and 
nonparametric information and to combine them. 

Then the method is applied to two separate cases of classifying multi- 
band data obtained by a single sensor. In each case, a set of multiple sources 
is obtained by dividing the dimensionally huge data into smaller and more 
manageable pieces based on the global statistical correlation information. By a 
Divide-and-Combine process, the method is able to utilize more features than 
the conventional Maximum Likelihood method. 
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CHAPTER 1 
INTRODUCTION 


1.1. Background 


Since the developments of the digital computer and sensor systems 
made it possible to apply the quantitative approach to remote sensing in 1960s, 
information concerning the surface of the Earth and its environment has been 
largely extracted from the multispectral data obtained by a single sensor. 

Within the last decade, as remote sensing and other data acquisition 
technologies have advanced, there has been a trend towards exploiting 
remotely sensed multispectral data in conjunction with related data from other 
sources for the purpose of extracting higher level information from multi-attribute 
data bases. For instance, the topographic information obtained from digital 
terrain data has been successfully used together with remotely sensed data in 
land cover analysis [Fleming et al. (1979), Franklin et al. (1986), Jones et al. 
(1988), Strahler et al. (1978)]. More recently, many researchers in the 
geographic information processing community have started reconsidering the 
possibility of utilizing remotely sensed data within geographic information 
systems (GIS) [Healey et al. (1988), Quarmby et al. (1988)]. Figure 1.1 depicts a 
typical multi-attribute database in remote sensing and GIS. In general, the 
information obtained from multiple sources is robust and more reliable than that 
from a single source. Furthermore, it may resolve ambiguities which might arise 

from single source analysis. 

To a large extent, the methods which have been used for the analysis of 
multisource data have been ad hoc or often based on qualitative interpretation 
techniques, drawing heavily on the expertise and intuition of application 
scientists. Whereas techniques for collecting and storing data from multiple 
sources (e.g., multispectral scanner, side-looking radar, digital terrain model, 
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Figure 1.1 A Multi-Attribute Database in Remote Sensing and GIS. 
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etc.) have evolved rapidly, techniques for extracting and analyzing information 
from such complex data bases are still in the beginning stage. With the 
advancement in designing sensor systems and the increasing availability of 
ancillary data, interest in extracting the great wealth of higher level information 
contained in geographic and remote sensing contexts has led to extensive 
demand for computer-based, automated (or semi-automated) methods for the 
analysis of multisource data. Their development will be hastened more and 
more by proliferation of various and sophisticated remote sensing platforms and 
sensors in the next decades. 

Unlike the situation in which we are dealing with purely spectral data 
from a single sensor, there are some conceivable problems in devising means 
for multisensor and multisource data analysis. Firstly, there is a difficulty in 
describing the disparate range of data types which have different units of 
measurement. The types of data to be combined cannot be assumed to be 
commensurable. For example, multispectral data represent the energy 
emanating from the scene of interest in different wavelengths while elevation 
data represent the altitude of the scene. Moreover, map-based ancillary data 
such as a soil map may even be nominal in nature. The situation becomes 
more complicated when the multi-attribute data bases include geometric 
characteristics such as lines, shapes, or sizes. 

Secondly, since spatial variation of the attribute in a geographic context, 
such as vegetation cover, soil type, or slope aspect, has an effect on the 
spectral responses obtained from remote sensors, there are possibly significant 
but unknown interactions among multiple data sources. For example, in the 
visible/infrared spectral range the reflected energy measured by a sensor 
depends on properties such as the pigmentation, moisture content and cellular 
structure of vegetation, the mineral and moisture contents of soils, and the level 
of sedimentation of water. However, when there is insufficient knowledge 
concerning the interactions among data sources, the observations obtained 
from the data sources have been treated as independent variables. Such an 
independence assumption should be adopted with caution in the case of a 
statistical multisource data analysis because the data sources which seem to be 
apparently uninteracting are unlikely to be statistically independent. 

Thirdly, while it is often reasonable to adopt the multivariate Gaussian 
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distribution to model the probability function of multispectral data alone, this 
parametric model is not generally applicable to accommodate geographic or 
topographic data combined with multispectral data when the representation of 
their joint probability function is unknown. 

Finally, there is an important factor which must be considered in 
combining multiple sources. Since various data sources are in general not 
equally reliable, the data sources usually provide a wide range of degrees of 
support for an observation, sometimes even in an inconsistent manner. Such 
information regarding the relative reliabilities of the sources should be included 
in the multisource data analysis. 

These problems have been the motivation for the development of the 
techniques by which inferences can be drawn systematically from complex data 
bases composed of disparate, unequally reliable sources, regardless of their 
data types and interactions with the other sources. 


1.2. Related Works 

During the last decade, there have been a number of different 
approaches to the analysis of multisource data in remote sensing and 
geographic information systems. The approaches listed in this section are not 
exhaustive of the related works but are representative. 

First of all, the “stacked vector” approach is the most straightforward 
method in which all data sources are considered simultaneously by organizing 
the respective measurements into a single vector. The resulting compound 
vectors are treated as data from a single source. Although this approach has 
been successfully applied to combined multispectral data and terrain data 
[Hoffer et al. (1975)], its use is limited to the situation where the sources are 
similar and their interactions are easily modeled. 

The “layered” approach employed by Fleming et al. (1979) is more 
general in the sense that it can deal with multiple sources of diverse data types 
by treating them separately. This approach has been used for mapping forest 
cover types based on multispectral data and topographic data. Its idea is to 
classify major cover types based on the multispectral data, and then further 
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subdivide the cover types to the species level based on the remaining data. 
Hutchinson (1982) has developed a similar approach, so called “ambiguity 
reduction” method, whose basic strategy is to stratify the data based on one (or 
more) of data sources, assess the results, and resort to the other sources to 
resolve the remaining ambiguities. A major disadvantage of these two 
approaches is that different groupings or orderings of the sources may produce 
different results. Furthermore, their mathematical schemes cannot incorporate 
the reliabilities and interactions of the sources into the classification process. 

Swain et al. (1985) proposed an approach which can handle an arbitrary 
number of independent data sources. In their mathematical framework, the 
global membership function is derived from Bayes' formula by applying two 
different statistical independence assumptions. Due to the commutative 
property of the global membership function, different orderings of the sources in 
combination do not have an effect on final results. This method has been 
extended by Lee et al. (1987) and Benediktsson et al. (1989a) so that the 
relative quality of the sources can be accounted for in the global member-ship 

function. 

Although their procedures in combining information from multiple data 
sources are different, the numerical representations of information in the above 
approaches are commonly based on the Bayesian inference, where posterior 
probabilities are defined by the multiplication of prior probabilities and 
observational probabilities. It is very important to recognize that in dealing with 
multispectral data combined with other forms of geographic data, the methods 
employed must be able to cope with uncertainties which arise both from intrinsic 
randomness of data and from ambiguities in modeling and combining disparate 

sources. 

Recently, learning procedures based on neural networks have been 
applied to the classification of remotely sensed multisource data [Benediktsson 
et al. (1989b)]. Since it is nonparametric in nature, the neural network approach 
is most useful when the distribution functions of data are not known. However, 
this approach usually involves a large amount of computational complexity in 
training due to an iterative procedure. 

Meanwhile, in the artificial intelligence and knowledge engineering 
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community, there have been a number of attempts to build plausible models for 
automated reasoning with multiple information sources [Cohen (1985) 
McDermott and Doyle (1979), Shafer (1976a), Zadeh (1965)]. Such attempts 
have been embodied as “inference techniques under uncertainty” [Duda et al 
(1976), Dubois and Prade (1980), Ginsberg (1984), Lowrance and Garvey 
(1982)] and used in various areas of science and engineering [Blonda et al. 
(1989), Duda et al. (1979), Garvey (1987), Garvey et al. (1981), Kim et al 
(1986), Moon (1989), Shortlife (1976)]. Applications to multisource geographic 
and remote sensing data have been rudimentary at best. 


1.3. Statement of Problem 

The importance of utilizing multisource data in ground-cover 
classification lies in the fact that it is generally correct to assume that 
improvements in terms of classification accuracy can be achieved at the 
expense of additional independent features provided by separate sensors or 
other forms of data sources. However, it should be recognized that information 
and knowledge from most available sources of data in the real world are neither 
certain nor complete. We refer to such a body of uncertain, incomplete, and 
sometimes inconsistent information as “evidential information.” 

In order for any methodology for multisource data classification to be 
implemented as a quantitative, computer-based technique, the methodology 
must be able to. (1) represent the partial information provided by the individual 
sensors as numerical measures, and (2) combine the measures by a 
combination rule to produce the overall assessment of the total evidence. 

Consider the problem of classifying a pixel X . (x, x m ) T to one of n 

classes denoted by co, for j=1, .... n , where Xj (i = 1, .... m) is the feature obtained 
from the .th source denoted by Si and the superscript r denotes the vector 
transposition. Suppose each data source Sj supports A denoting the event of X 
belonging to a certain class cowith a degree of belief S(A| Xj ) = b. Throughout 
the report, the term “degree of belief” or “belief measure” will be used for any 
kind of numerical measure representing one’s belief states regarding the 
events. Then, the first problem above is equivalent to the construction of belief 
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measures based on evidential information provided by each data source. 

As we mentioned earlier, evidential information is characteristically 
uncertain and incomplete. Therefore, the classical Boolean logic is not 
adequate for representing evidence because it cannot have intermediate states 
between “True” and “False.” In other words, the Boolean expressions never 
capture any notion of the relative strength of partial beliefs. Bayesian 
probabilities have been frequently used to represent partial beliefs. Yet this is 
possible only when there is a sufficient amount of data to estimate the statistical 
parameters of an assumed probability model. Further, there is no appropriate 
way for representing “total ignorance” in a Bayesian framework because the 
Bayesian probabilities should be “additive”, that is, 

P(A) + P(A) = 1 (1-3.1) 

where A is the complementary event of A . To illustrate the consequence of this 
requirement, suppose there is no evidence available either for or against the 
occurrence of two exclusive and exhaustive events. In the Bayesian framework, 
both events are equally assigned a probability of which seems quite different 

from specifying that nothing is known regarding the occurrence of the events. 

Once the belief measures based on individual sources are given, the 
next problem is: whether we can find a combined degree of belief B( A | x 1f 
,x m ), or equivalently, whether we can build a numerical formula such that 

B{ A | \ h - ,\ m ) = J{b^ .... b m ) (1 -3.2) 

If the data sources are not believed to be equally reliable, the relative 
reliabilities of the sources must be considered in computing the combined 
degree of belief, i.e., 

B( A | x 1t - ,xj = ?(bi, .... b m ; a 1( .... a m ) (1 -3.3) 

where aj's denote the relative reliabilities of the sources. 

When the numerical representation of belief and the formulation of 
combining function depend on the expertise and intuition of human analysts, 
the solutions to the above problems are said to be ad hoc. 
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1.4. Objective of the Research 

The objective of the research is to develop a mathematical framework for 
dealing effectively with multisource data in remote sensing and GIS and to 
provide a preliminary demonstration of its value. The methodology described in 
this report has evolved from “evidential reasoning,” where each data source is 
considered as providing a body of evidence concerning propositions with 
certain degrees of belief. The degrees of belief based on the body of evidence 
are represented by “interval-valued (IV) probabilities” rather than by 
conventional additive probabilities so that uncertainty can be embedded in the 
measures. 

There are three fundamental problems in the multisource data analysis 
based on IV probabilities: (1) how to represent bodies of evidence by IV 
probabilities, (2) how to combine IV probabilities to give an overall assessment 
of the combined body of evidence, and (3) how to make decisions based on IV 
probabilities. 

There have been various approaches to IV probabilities in the areas of 
philosophy of science and statistics. The primary focus of this report is on the 
unification of various concepts of IV probabilities so that IV probabilities can be 
readily accessible to representation and combination of multiple bodies of 
evidence without any conceptual ambiguities. This report pursues an axiomatic 
approach to IV probabilities, where IV probabilities are defined axiomatically 
based on the least of the common properties which are consistently required in 
the various approaches. Secondarily, this report focuses on formal methods of 
representing statistical evidence by IV probabilities, first based on acceptable 
models in robust estimation of probabilities, and then using the likelihood 
function of observed data. 

We do not propose any brand-new rule for combining multiple evidence. 
Instead, some existing rules are investigated in terms of their inferencing 
mechanisms when they are expressed as set-theoretic functions. Although IV 
probabilities provide an innovative means for the representation of evidential 
information, they make the decision process rather complicated. We need more 
intelligent strategies for making decisions. This report addresses the 
development of decision rules over IV probabilities as the counterparts of 
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conventional decision rules in statistics. 

In this report, the problem of multisource data analysis in remote sensing 
and GIS is viewed as an application area for the use of artificial intelligence and 
knowledge engineering techniques. 


1.5. Thesis Organization 

This report is made up of seven chapters. In this introductory chapter, the 
problems in the analysis of multisource data have been addressed, and the 
objective of the research has been stated. In the following chapter, after 
reviewing various approaches to IV probabilities, an axiomatic approach to IV 
probabilities is introduced. Chapter 3 describes how belief functions for 
statistical evidence can be constructed in the form of IV probabilities. Chapter 4 
examines subjective Bayesian rules and Dempster's rule for combining 
evidence in the sense of satisfying some desirable properties which agree with 
human intuition. Particularly, attention is paid to the inference mechanisms of 
Dempster's rule. In Chapter 5, decision rules over IV probabilities are defined 
on the basis of well-known decision principles in statistics, such as the 
Likelihood Principle and the Minimax Principle. For the purpose of general 
assessments of its ability in capturing and utilizing information in multisource 
data, the approach is applied to the problems of ground-cover classification 
based on multispectral data in conjunction with other sources of data in remote 
sensing. The experimental results are presented in Chapter 6 and compared to 
the performance of a traditional maximum posterior probability classification 
method. Finally, Chapter 7 concludes the report by summarizing and 
suggesting directions for further research. 
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CHAPTER 2 

APPROACHES TO INTERVAL-VALUED PROBABILITIES 


2.1. Introduction 


Interval-valued probabilities are, in general, a more adequate scheme 
than point-valued probabilities to express one’s state of knowledge in the sense 
of handling uncertain, incomplete evidential information. IV probabilities can be 
thought of as a generalization of conventional additive probabilities, with the 
lower and upper extremes of the interval corresponding to an event being 
bounds for the unknown actual probability of the event. The endpoints of IV 
probabilities are called the “upper probability” and the “lower probability. 

There have been various works introducing the concepts of IV 
probabilities in the areas of philosophy of science and statistics. For example, 
Koopman (1940) derives the upper and lower probabilities based on the 
intuitively evident laws of consistency governing all comparisons in partial 
ordering of non-numerical probabilities. Smith (1961) proposes a system of IV 
probabilities by considering the strength of one's belief in betting odds as an 
interval. Good (1962) considers the upper and lower probabilities of an event 
by analogy with the outer and inner measures of a non-measurable set. 
Dempster (1967) formulates a system of upper and lower probabilities induced 
by a set-theoretic multivalued mapping. Suppes and Zanotti (1977) show how 
a random relation generates upper and lower probabilities in the set-theoretic 
image space. And Walley and Fine (1982) present a frequentist account of IV 
probabilities based on a finite event algebra. 

Among the above approaches, only Dempster’s and Walley and Fine’s 
models are useful for parametric statistical inference. Dempster’s work and 
Shafer’s mathematical theory of evidence [Shafer (1976a)], together called 
“Dempster-Shafer theory,” have shown their usefulness in various evidential 
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reasoning systems [Garvey (1987), Garvey et al. (1981), Zhang and Chen 
(1987)]. Walley and Fine’s approach provides the fundamental concepts of a 
frequentist theory of statistics for IV probabilities. Their results indicate that an 
objectivist or frequency-oriented view of probability does not necessitate an 
additive probability concept, and that IV probability models can represent a type 
of indeterminacy not captured by additive probabilities. In the following two 
sections, both approaches will be briefly reviewed. 

Although the mathematical rationales behind the approaches listed 
above are different, there are some common properties of IV probabilities which 
are consistently required. This chapter introduces an axiomatic approach to IV 
probabilities, where IV probabilities are defined by a pair of set-theoretic 
functions satisfying the common properties, so that conceptual ambiguities can 
be avoided. 


2.2. Dempster-Shafer Theory 

In his 1 960’s works, Dempster (1967, 1968) proposed a generalized 
scheme of statistical inference about a parameter space by introducing upper 
and lower probabilities induced by a multivalued mapping. His scheme has 
been further developed and recast as a “mathematical theory of evidence” by 
Shafer. In this section, after briefly recalling the concepts of Dempster’s upper 
and lower probabilities, we discuss the formal framework of Shafer’s theory in 
the aspect of evidential reasoning. 

Suppose we have a pair of spaces X and 12 denoting respectively a 
sample space and a finite parameter space. Let T be a multivalued mapping 
which assigns a subset T x cl2 to every x e X and let p be a probability 
measure assigning probabilities to the members of the class ¥ of subsets of X. 
Then, (X, x ¥, p) is a probability space, and this model corresponds to a random 
experiment where the outcome cannot be precisely observed but can only be 
located in a subset of all possible outcomes. 

For any A c Q, define 


A* = { x e .T|r x nA*0} 


(2.2.1) 
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and 

A* = { x € X I r x c A, r x * 0 } (2.2.2) 

A* consists of those xeX which can possibly correspond under r to an weft, 
while A* consists of those xeX which must lead to an coe£X Then, the upper 

probability and the lower probability of A are defined respectively as : 


P*(A) = 


P(A*) 

p(Q + ) 


(2.2.3) 


P*(A) 


H(AJ 


(2.2.4) 


where ft* = ft, is the domain of T. Note that P*(A) and P,(A) are defined only if 
p.(ft*) *0. Since A* consists of those xeX which can possibly correspond 
under r to an co e A, ji(a*) may be regarded as the largest possible amount of 
probability which can be transferred to the outcomes toe A from the measure p. 
Similarly, A* consists of those xeX which must lead to an co e A. So, p(AJ 

represents the minimal amount of probability which can be transferred to the 
outcomes co e A. The denominator p(ft*) = p(ft+) in eq. (2.2.3) and eq. (2.2.4) 

is a normalizing factor. The normalization is necessary in the case where there 
is any xe X which does not map into any subset of ft. In this case, the subset 
{ x € X | r x = 0 } must be removed from X, and the measure of the remaining 

set ft* should be renormalized to unity. 

Dempster has assumed that the actual probability measure of A, P(A), 
lies in the interval [P*(A), P*(A)] such that 


P„(A) < P(A) < P*(A) 


(2-2.5) 


The degree of uncertainty concerning the true value of P(A) is represented by 
the width, P*(A) - P + (A), of the interval. 

In Shafer’s theory, Q is called the “frame of discernment” containing a 
finite number of exhaustive and mutually exclusive propositions. 2^ denotes 
the set of all possible subsets of Q. His theory of evidence may begin by 
defining “basic probability assignment”: 
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«:2 n -»[0, 1] (2.2.6) 
where m satisfies the following conditions, 

(1 ) m(0) = 0, (2 2 7) 

(2) w(A ) = 1 (2 2 8) 

ACQ ' ' 

Given a basic probability assignment m over 2 n , Shafer’s “belief 
function” BeC : 2 Q [0 , 1] is obtained as: 


«4A) = £ i»(B) 
bca 


(2.2.9) 


It satisfies the following conditions: 


(1) BeR0) = 0 


( 2 . 2 . 10 ) 


(2) BeRQ.) = 1 

(3) For every integer n and every collection A,, .... A n of subsets of ft, 


$4A 1 u...uA n )>X$4A i )-X#4A j nA j ) +...+ (- 1)^+1 <Be({A 1 n...nA n ) 

1 i<j 

( 2 . 2 . 12 ) 

The basic probability assignment which produces a given belief function is 
uniquely recovered from the belief function by the inverse formula of eq. (2.2.9) 
[see Shafer (1976a)]: 


^ A ) = I(-1)l A - B l34B) for all A c (2 2 13) 

B C A ' 


where |C| denotes the cardinality of a set C. 

The basic probability number of a set A c Q, m{ A), may be understood 
as the exact measure of belief that the knowledge source has committed to A. 
A is called a “focal element” of the belief function BeC over £2 if m(A) > 0. The 
measure ascribed to the frame of discernment, m(Q), represents the degree of 
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ignorance, i.e., the portion of belief that could not be assigned to any smaller 
subset of £2 based on the evidence at hand. It may be committed to some 
subsets with the help of additional information. Bed A) represents the measure 
of the total belief committed to A. In fact, eq. (2.2.9) reflects the basic intuition 
that a portion of belief committed to a proposition is also committed to any other 
proposition it implies. 

While Be({A) describes one’s belief about A, it does not reveal to what 
extent one doubts A, i.e., to what extent one believes the negation of A, A. 
Once BedA) is known, the upper probability of A is defined as: 

2>/(A) = 1 - Bed~A) (2.2.14) 

In the evidential reasoning based on the Shafers theory, Bed A) is called 
“degree of support” representing the extent to which a given body of evidence 
supports A, while !P/(A) is called “degree of plausibility” representing the extent 

to which the body of evidence fails to refute A. 


2.3. A Frequentist Theory of Upper and Lower Probabilities 

Walley and Fine (1982) give a limiting frequentist interpretation of P. and 
p* as “lim inf” and “lim sup” of relative frequencies in hypothetical unlinked 
repetitions of an experiment, which is a generalization of the usual limiting 
frequentist interpretation of additive probabilities. Their results provide the 
statistical basis whereby IV probability models of random experiments can be 
inferred from observations made on unlinked repetition. In this section briefly 
described is the link between relative frequencies and IV probabilities. 

Let B be a Boolean algebra of subsets of £1 Suppose that propensities 
of events AeB in independent, identically distributed (iid) repetitions Ej, .... e n 
are represented through the lower probability P.. To provide a connection 
between frequency and propensity, P. is inferred or estimated from relative 

frequency data. Let q denote the relative frequencies of all events in ..., e n - 
More reliable information regarding the underlying marginal probability P. can 

be obtained on the basis of the outcomes of the repeated experiments than the 
relative frequencies observed at any particular single experiment E,. Walley and 
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Fine propose an estimator 

Ln = niin { fj(A) : k(n) <\< n} forallAe2 n (2.3.1) 

where k is some function such that k(n) -> oo and ^ -> 0 as n -> oo (e.g., k(n) 

= ( Vn]). 

Although it is not “optimal” in any sense, the above minimum estimator 
makes use of the additional information concerning the past evolution of the 
sequence of relative frequencies. The estimator has asymptotic properties in a 
sequence of infinite trials, and parallels the Bernoulli’s law of large numbers. 

There is no explicit description of T n in terms of relative frequencies. However, 

the upper probability is given in terms of upper and lower "envelopes" which will 
be described in the next section. 


2.4. Axiomatic Approach 

A system of IV probability derived from the definitions and specifications 
of a particular mathematical or statistical concept may cause complications 
resulting from the need to satisfy underlying assumptions of the system. In the 
axiomatic approach, IV probabilities are formulated by defining the upper and 
lower probabilities of the interval as set-theoretic functions which satisfy some 
pre-specified axioms. 

Definition 2.1. [Suppes (1974)] Let a be a Boolean algebra of subsets of fi. 
The interval-valued probability [£, v] over a is defined by the set-theoretic 
functions 

lower probability function £:#-»[ 0 ,l] (2 4 1 ) 

upper probability function w:$-»[0,1] (2 4 2) 


satisfying the following conditions: 
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I 11(A) > L( A) > 0 for all A e B 

II U(Q) = L(Q.) = 1 

III For any A, B e B and AnB = 0, 
4AuB) > 4A)+4B) 

11(A u B) > 11(A) + -U(B) 

4AuB) < 4 A) + W(B) < U(A U B) 



(2.4.3) 


(2.4.4) 

(Super-additivity of l) 

(2-4.5) 

(Sub-additivity of u) 

(24.6) 

(Mixed-additivity of l and ii) 

(2.4.7) 


These conditions are the least requirements on L and zi for further development 
of the theory of IV probability. The following lemma sets forth some significant 
properties of IV probabilities as simple consequences of the above definition. 

Lemma 2.1. For any A, B € ‘B, the interval-valued probability [l, zi] has the 


following properties: 

(i) L(A) + ?i(A) = 1 ( 2 - 4 - 8 ) 

(ii) z{0) = ZJ(0) = 0 ( 2 - 4 - 9 ) 

(iii) If AcB then 4A)<£(B) and ZJ(A)<Z 1 (B) (2.4.10) 

(iv) L(A) + L(B) < 1 +4AnB) (2.4.11) 

(v) Z1(A) + ZI(B) > 1 + Z1(A n B) (2.4.12) 


Proof, (i) follows immediately from eq. (2.4.4) and eq. (2.4.7). (ii) is obtained by 
eq. (2.4.4) and eq. (2.4.8). For (iii), if A c B then by eq. (2.4.7) 

ZJ(B) = Zl( A u (B-A)) > ZJ( A) + 4 B-A) 


and by eq. (2.4.5) 

4B) = 4A u (B-A)) > 4A) + 4B-A) 
Since 4B-A) > 0 from eq. (2.4.3), 


zi(A) < 11 (B) and 4A) < 4B) 



17 


For (iv), 


A A) + L(B) < 1 - zi(K) + L(AnB) 

(By eq. (2.4.8) & eq. (2.4.10)) 

= 1 - ZI(A) + 1 - w(AnB) 

(By eq. (2.4.8)) 

< 2 - w(AuB) 

(By eq. (2.4.6)) 

= 1 + £(AnB) 

(By eq. (2.4.8)) 

Likewise, (v) can be proved. ■ 



The following definition given by Huber (1973) connects the upper and 
lower probabilities to the supremum and infimum of a class of probability 
measures. This connection becomes essential later in Section 3.2 where IV 
probabilities are constructed by some models in robust estimation of probability 
measures. 

Definition 2.2. Let be the set of all probability measures on a Boolean 
algebra of all subsets of Q and !Pan arbitrary non-empty subset of [l, zi] is 
said to be "representable" by <P if l and wean be defined as: 

L(A) = inf { 7t(A) : k e (2.4.13) 


and 


W(A) = sup { 71 (A) : 7C e T] (2.4.14) 

for all Ae ( B. In this particular case l and zi are called a “lower envelope” and 
an “upper envelope” respectively. 

It has been proven by Huber and Strassen (1973) that if [l, zj] is an envelope, 
then it is an IV probability. The converse is not always true. The following 
example from Huber (1981) illustrates such a case. In fact, [l, zi] being an IV 
probability does not imply even the existence of the class fP of probability 
measures. 


Example 2.1. Let Q have cardinality |Q| = 4, and assume that £(A) and 
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V(A) depend only on the cardinality of AcQ, according to the following table: 


IAI 


L 


ZI 


0 


0 


0 


1 

2 


3 

i 


0 


1 

2 


Then \L zi] satisfies the IV probability's conditions in Definition 2.1, but there is 

iAi 

only a single additive set function between L and zi, namely P(A) - 4 ; hence 
[l, zi] is not representable. 

The following definition and lemma result in interesting subclasses of IV 
probabilities by requiring relatively stronger constraints on Land zi. 

Definition 2.3. [Choquet (1953)] The lower probability function L in Definition 
2.1 is said to be “monotone of order n “ or briefly “n-monotone”, where n (> 2) is 
a positive integer, if for every collection A b A 2 , • A n of subsets of £2 

£(A 1 u...uA n )>X4A i )-l4A i nA j ) +...+ (-l)"+i 4A 1 n...nA„) (2.4.15) 

i i<j 

The conjugate upper probability function zi is said to be “alternating of order n" 
or “n-alternating” and satisfies 

+...+ (-*)" +l «(A,n...nA „) (2.4.16) 

i i<j 

It is known that if L (Zi) is monotone (alternating) of order n, then it is also 
monotone (alternating) of order k for any integer 2 <k<n. In particular, when 
k= 2, Land zi have the following properties: 

L(A!UA 2 ) > 4 AO + 4A 2 ) - 4A,nA 2 ) (2-monotone) (2.4.17) 

zi{ A[UA 2 ) < zi( Aj) + zi{ A 2 ) - “i^AjoA^ (2-alternating) (2.4.18) 

The following lemma shows that [l, ?;] satisfying the above equations is an IV 
probability. 
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Lemma 2.2. If £ and n are respectively 2-monotone and 2-alternating and 
satisfy the following conditions for all A e r B \ 

( 2.4.19) 

(2.4.20) 

(2.4.21) 

then [£, v] is an IV probability. The converse is not necessarily true. 


(i) 11(A) > a A) > 0 

(ii) u(Q.) = l(Q) = 1 

(iii) £(A) + 11(A) = 1 


Proof. To prove this lemma, we only need to show that l and u are super- 
additive, sub-additive, and mixed-additive as in Definition 2.1. For any A, B e p, 
if AnB = 0 , from eq. (2.4.17) “2-monotone” implies "super-additive”, and from 

eq. (2.4.18) “2-alternating” implies "sub-additive.” When AnB =0, B = 
(AuB)uB. Using eq. (2.4.5) and eq. (2.4.20), 


L(A) = £<(AuB)uB) > £(XuF) + L( B) = 1 - ti(AuB) + L( B) 


Likewise, 


«(AuB) > v(A) + l( B) 


11 (B) = U(A£>(MJE)) < <u( A) + ZI( A^B) = v(A) + 1 - £(AuB) 

£(AuB) < <U( A) + £(B) 

Hence, £ and u have mixed-additivity, and the above lemma is proved. ■ 

By comparing eq. (2.4.15) with eq. (2.2.11), Shafer’s belief function <Be( is n- 
monotone. Consequently, <?l is n-alternating. According to the above lemma, 
Vd along with Tt formulates a subclass of IV probabilities. We can summarize 

the implicative relationship among IV probabilities and its subclasses as 
follows: 

£ is n-monotone and n is n-alternating for n >2 => £ is 2-monotone and 
V is 2-alternating => [£, v] is an envelope => [£, <u] is an IV probabilities. 
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In practical applications, 2-monotone and 2-alternating IV probabilities seem to 
be sufficiently general and mathematically amenable to develop an alternative 
statistical inferencing scheme to Bayesian inferencing. 


2.5. Summary 

In this chapter, we have discussed the axiomatic approach to IV 
probabilities whose mathematical framework is the theoretical basis of the 
contents treated in the rest of this report. The axiomatic IV probability was 
represented first by the pair of set functions and then by the supremum and 
infimum of a class of probability measures. Subclasses of IV probabilities were 

introduced. 

IV probabilities as a generalization of additive probabilities give rise to 
some advantages such as representing a certain type of indeterminacy or 
uncertainty not captured by additive probabilities. The choice between 
deterministic, additive probability and IV probability models will depend on our 
background knowledge concerning the context of particular applications, and 
especially the amount and reliability of the information available to help in 
specifying the model. 

In this chapter, the contribution of this research is in a unification of 
various concepts of IV probabilities so that IV probabilities can be readily 
accessible to representation and combination of multiple bodies of evidence. 
Lemmas 2.1 and 2.2 are originally formulated and proved in this report. 
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CHAPTER 3 

REPRESENTATION OF BELIEF FOR STATISTICAL EVIDENCE 


3.1. Introduction 


When a body of evidence is based on the outcomes of statistical 
experiments known to be governed by any (objective) probability models, it is 
called “statistical evidence.” One of the fundamental problems in applying IV 
probabilities to real-world problems is how to represent a body of statistical 
evidence by IV belief functions. In fact, the utility of any existing system of IV 
probabilities is limited by the lack of effective approaches to quantitative 
representation of bodies of evidence. Throughout this chapter a lower 
probability and an upper probability are respectively called a “support function 
(SpY and a “plausibility function ('?/)” implying that they provide belief measures 

for the class of subsets of a finite space Q based on a body of evidence. 

The most extreme type of interval-valued belief function is the vacuous 
belief function” defined as 


[ 0 if A*Q 
■ S ? (A) ■ I 1 if A=Q 


(3.1.1) 


and 


®f( A) = 


0 if A=0 

1 if A*0 


(3.1.2) 


The vacuous belief function assigns [0,1] to every non-empty subset A of Q, and 
[1 ,1] to Q itself. Its only focal element is £X It is a natural model for representing 
complete ignorance — no evidence about Q at all. 

The next simple type is a “simple support function”, a belief function 
based on “homogeneous” evidence - a body of evidence which precisely and 
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unambiguously supports a single non-empty subset of Q. Suppose Sp is a 
simple support function focused on a subset A, and let Sp{ A) = s (0 < s < 1). 
Then the support function for any B Q £2 is given by 

fo if B2A 

Sp(B) = j s if B2A but B*12 (3.1.3) 

ll if B=Q 


It can be easily shown that a simple support function is 2-monotone. The 
conjugate plausibility function of the above support function is given by 


<P({B) = 


1-s if AnB=0 
1 if AnB*0 


(3-1.4) 


The effect of the evidence represented by the simple support function in eq. 
(3.1.3) is limited to providing a degree of support s for A and any subset B of Q 
implied by A. 

The next section introduces a possible way of constructing interval- 
valued belief functions based on some models in robust statistics. Shafer 
(1976b) presents two different methods for constructing belief functions based 
on a body of statistical evidence: the “linear plausibility method” and the 
“simplicial plausibility method.” Section 3.3 examines the characteristics of the 
belief function in the linear plausibility method and provides its generalized 
scheme by weakening an assumption underlying it. The result of the second 
method, which is the same as that of Dempster's structure of the second kind 
[Dempster (1968)], is outside the scope of this report because it applies to an 
infinite space Q which parametrizes all multinomial distributions and 
consequently presents formidable computational difficulties. Section 3.4 
discusses the quantitative representation of source reliability in the context of 
pixel classification of multiple data sources. 


3.2. Belief Functions based on Robust Estimation of Probability 
Measures 

In robust statistics, the true underlying probability distribution is assumed 
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to lie in a certain neighborhood of an idealized model distribution. The 
neighborhood describes inaccuracies in the specification of the true distribution. 
This section illustrates how belief functions in the form of IV probabilities can be 
constructed by the supremum and infimum of a class of probability measures 
describing the neighborhood, as defined in eq. (2.4.13) and eq. (2.4.14). 

Definition 3.1. [Huber (1973)] Consider any set functions X and o on o is 
said to “dominate" X, denoted by x> » K when o(A) > X(A) for all A € <B. 

Let = { 7 t € fW | v » n } be the set of all probability measures n 

dominated by x>. The following lemma from Huber and Strassen (1973) shows 
the existence of a 2-alternating upper probability in 

Lemma 3.1. Let x> be 2-alternating. Then for every A e ‘B there exists ane % 
such that 7 t(A) = u(A). This implies that t> coincides with the upper probability 
determined by !Z\)‘ 


Most of the proposals listed in Huber (1981), such as e-contamination, total 
variation, Prohorov distance, Kolmogorov distance, and Lfevy distance, for 
formalizing the notion of an inexactly specified probability measure lead to a set 
q> v defined by a certain 2-alternating set function. The following models are the 

ones which make sense in arbitrary probability spaces. 

Let e and 8 be fractions between 0 and 1, and P 0 denote an idealized 
model distribution as an estimation of the actual distribution: 


A. e-contamination or gross error model : 

!P V = { rce fW| re = (1 - e)P 0 + eA, A e (3.2.1) 

For any non-empty set A e % 


\)(A) = sup 3^ = (1 - e)P 0 (A) + e 


(3.2.2) 
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B. Total variation model : 

2\> = { K G |tc(A ) - P 0 (A)| < e for all A g $ } (3.2.3) 

For any non-empty set A g % 

d(A) = sup ip v = min { p o(A ) + e, 1 } (3.2.4) 

For both cases, o is the 2-alternating upper probability function, and the 
conjugate lower probability function is obtained as (l-uc), where the superscript 
c denotes the complement. 

The e-contamination model assumes that the actual probability has a 
gross error with an arbitrary (unknown) distribution, instead of a strict parametric 
model. The total variation model formalizes the possibility of unknown small 
deviations from the idealized model P 0 by assigning a tolerance to it. 

In being applied to real problems, both models demand additional labor 
to find an optimal value of e. Different e’s will result in various IV probabilities. 
Most of the algorithms for robust parameter estimation based on the above 
models adopt iterative procedures [Eom (1986), Huber (1981)]. The iterative 
procedures not only cost tremendous computational complexity but also raise 
another problem of proving convergence of estimators. 

In the following section, IV belief functions are derived from the likelihood 
functions of observed data. Compared to the ones described in this section, 

they require much less computation and have readily usable mathematical 
formulas. 


3.3. Belief Functions based on Likelihood Principle 

The belief functions described in this section depend on two underlying 
assumptions. Before the assumptions are listed, it is necessary to define the 
“consonance” of belief functions. 


Definition 3.2. [Shafer (1976a)] A belief function is said to be “consonant” if 
its focal elements are nested, i.e., if for a, C Q (i=1 r) such that m( A,) > 0 for 
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all i and £ ™( A i) = 1 . A i c A j for an Y ' < J- 
i=1 

A simple support function is consonant while the converse is not 
necessarily true. The following lemma describes the nature of consonant belief 
functions. 


Lemma 3.2. [Shafer (1976a)] Suppose Sp '■ 2^ — > [0, 1] is a support function 
and TC: 2^ — > [0, 1] is the conjugate plausibility function. Then the following 
assertions are all equivalent: 

(1) Sp is consonant. 

(2) Sp(AnB) = min { Sp{ A), Sp{ B) } for all A.BcQ, 

(3) 2>/(AuB) = max { 2>/(A), 2>/(B) } for all A, B c £1 

(4) 2>/(A) = max { (H{{ «}) : to e A } for all non-empty A e £1 


Example 3.1. Let Q = { co-j, co 2 , « 3 }. Suppose a body of evidence E provides 
basic probability numbers m({co-( }) = 0.5, ^{g^, w 2 }) = 0.2, m(Q) = 0.3, and m( A) 
= 0 for all other subsets A of £2. Then the support function Sp of E is consonant 

and given as: 


SpiM) = 0.5 Sp({ co 2 }) = 0 5 ^{co 3 }) = 0 

Sp({(* i , co 2 }) = 0.7 Sp{{ «i , co 3 }) = 0.5 Sp{{ 0)2, 0)3}) = 0 

Sp{Q) = 1 


The plausibility function 2>/of E is given as: 

^{coi}) = 1 = 0.5 = 0.3 

. « 2 }) = 1 ■ “3}) = 1 “3}) = 0 5 

TBSl) = 1 


Now, suppose that the observations of a statistical experiment are 
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governed by one of 3 finite set of prob3bility models { p w | co e £2 }, where p is 
an ordinary probability density function on X given to. The linear plausibility 

function based on this body of evidence is derived from the following 
assumptions: 


(1 ) the degree of plausibility of a singleton {to | to e £2} is proportional to py 

(2) the plausibility function is consonant. 

The first assumption corresponds to our intuition that an observation x € X 

favors those elements of £2 which assigns the greater chance to x. Shafer 
claims that x should determine a plausibility function $(* obeying 

^({w}) = C pjx) for all co e £2 (3.3. 1 ) 

where the constant C does not depend on co. He further shows that the first 
assumption, together with the second assumption of consonance, determines a 
unique consonant plausibility function as 


2.4(A) „ 

max^Jx) : coe £2} 


for all non-empty AcQ 


(3.3.2) 


When A is a singleton, say (co’}, the consonant plausibility function gives the 
relative likelihood of co’ to the most likely element in £2. The conjugate support 
function is obtained by 


max{p C() (x) : coe £2} 


for all non-empty A cr £2 


(3.3.3) 


The next theorem derives the consonant basic probability assignment 


Theorem 3.1. Suppose that £2° = { coh), co (2) co ( ^} is an ordered set of £2 

such that /y 0 > /y j> for any 1 <i<j<n. If Sp x based on the statistical evidence is 
consonant, then it has the focal elements 


A k = { co (1) , co (2) co (k) } fork=1,..., 


n 


(3.3.4) 
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Proof. Let m x denote the basic probability function of Sp x . For a singleton 
subset A of Q°, 


m x (A) = Sp x { A) = 


F W (..(») - i( A . 

Pa,0>( x ) 

0 otherwise 


(3.3.5) 


Thus, Ai = {of 1 )} is the smallest focal element of Sp x . For any AcQ° (|A| = 2), 
eq. (2.2.13) gives 


m x (A) = 


P W (2)(X) P(oW(*)_ jf A _ { t0 (1) i0) (2)} 

P^M 

0 otherwise 


(3.3.6) 


Let A = { co (1 \ .... o) (i ' 1) I co (i+1> , co (k) } for 3 < k < n. 

m*(A) - 2 <-1) |A_B| ^Px( B > 

B^A 

= I [H ) |A_B_1| 5p x (B) + (-1 )l A - B - 2 l5p x (Bu{o)( k )})] 

B<=(A-{co( k >}) 

= - m x (A-{0)(k>}) + X iH ) |A “ B| ^ x (Bu{co(k)})] 

BqA-{0)< k )}) 

= - m x (A-{co( k )}) + m x (A-{co(k)}) 


= 0 


For A k = { co* 1 ), co< 2 > co (k) } (3 < k < n-1), eq. (2.2.13) gives non-zero basic 

probability numbers 


«x(A k ) - 


Pi oO)(x) 


(3.3.7) 


And, 


n - 1 


mx(Q°) = 1 - 2, m x( A k) = 
k=1 


Pm(")( x ) 

P<dO>( x ) 


(3.3.8) 
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Hence, the basic probability function m x of Sp x is given as 


r 


^(k)(xHy k+ i)(x) 




m x (A) = 


A 


/WnM 

/W( x ) 
Put u( x ) 
0 


for A^tof 1 ), to (2) , ( 0 < k )} (1 <k<n-1 ) 

a = n° (=Q) 

otherwise 


(3.3.9) 


and the focal elements of Sp are A k = { w* 1 ), cd ( 2) «< k) } for k =1 n. 


Although the consonant belief function described above is simple to 
implement, its application is limited to the particular cases where the 
consonance assumption is satisfied. Indeed, Shafer made a remark regarding 

his method; “ these assumptions must be regarded as conventions for 

establishing degrees of support, conventions that can be justified only by their 
general intuitive appeal and by their success in dealing with particular 
examples.” [Shafer (1976a)] 

A generalized scheme of the consonant support and plausibility functions 
can be formulated by weakening the consonance assumption. 

Definition 3.3. A support function Sp: 2^ -> [0, 1] is said to be “partially 
consonant if there exists a partition { , w^, .... iP r } of Q. and Sp is consonant 

in every , H[ ( fork=1, ..., r. 


In the problem of classifying remotely sensed data, C2 represents a set of 
information classes. The information classes in remote sensing can be 
partitioned into major ground-cover types, e.g., soil, vegetation, and water 
[Swain et al. (1978)]. This hierarchical structure of the information classes 
motivates the partitioning of Q for partial consonance. 

The following theorem and lemma derive the partially consonant basic 
probability assignment and the corresponding interval-valued probabilities. 
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Theorem 3.2. Suppose that Sp is partially consonant on a partition { , m' 1 , 'W' 2 - 
.... W r ] of Q. Let = {«£>. 4 2) denote an ordered set of H such 

r 

that PaS> Pat k for any 1<i<j<n k , where ^n k = n. Then the basic probability 

k=1 

function m of Sp is given as 

f for A={co k 1) co ( k } (1<l<n k -1 ) 


m(A)= < 

i 

CpPo^’ 

for A=n% 

for 1 <k<r (3.3.10) 


- 0 

otherwise 


where 





Cp = [ X max { : 0)6 ^k) 1 1 (3.3.11) 

k=1 

Proof. Since Sp is partially consonant on { W 1t W 2 , .... 'W / r }■ ‘ s consonant in 

every for k=1 r. Using eq. (3.3.9), we can derive eq. (3.3.10). To prove 

this theorem, it is sufficient to derive eq. (3.3.1 1). 

r r 

1 = £ m( A) = X { £ ™( A )} = C P- X max t P<o : 0)6 } 

A£Q k= l k=1 


r _ 1 

C P = [ Z max { Pa : coe ^ } ] 
k=1 

Thus the theorem is proved. ■ 

Lemma 3.3. The partially consonant plausibility function and support function 
corresponding to eq. (3.3.10) are 

r 

2>/(A) = C p £ max{pco : a) e AniV k } 
k=1 


(3.3.12) 
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max{27({co}) 


: to e An*ny 


Sp(A) = X [max{^o) : a) e W k } - max^o 
K=1 


0) e An'HjJ] 


(3.3.13) 


(3.3.14) 


Proof. Use eq. (2.2.13) and eq. (2.2.14). ■ 

Partial consonance is weaker than consonance in the sense that it 
includes consonance when r = 1, i.e., the partition of Q is Q itself. In the other 
extreme case where r=n, i.e., the partition consists of n singleton subsets of £2, 
the partially consonant support function becomes the Bayesian probability 

function (Sp{{ coj}) = ^({coj}) = m({o)j}) for i=1 n). While partial consonance 

gives a flexibility to Shafer’s linear plausibility method, it raises the problem of 
how to determine the optimal partition of £2; i.e., the partition which will give the 
best classification accuracy. In practice, the partition must be chosen based on 
relationship among the classes in the application at hand. 

Example 3.2. Let £2 = {o^, co 2 , co 3 , co 4 }. Suppose that a single observation x 
provides p W) {x) = 0.5, p^ 2 {x) = 0.3, fto 3 (x) = 0.15, and p^{x) = 0.05. Table 3.1 
shows the values of m x , Sp x , and 2>£ for all subsets of £2 in both cases of 
consonance and partial consonance on the partition {{co 1 , co 2 }, {co 3 , co 4 }}. 

It is very interesting that both intervals given by the belief functions 
contain the additive probability (P A (x)) for every A except {o^, co 2 } and {co 3 , oo 4 } in 

partial consonance. Compared with the consonance case, the partially 
consonant belief function always provides intervals of less width, 
correspondingly less degrees of uncertainty. It means that the assumption of 
partial consonance requires more knowledge about a given body of evidence. 

Note that low Sp do not necessarily imply low tP/'whereas high Sp always 
imply high (PC. We can also observe two relations: (1) Sp(A) + Sp(A) < 1, and (2) 
P({A) + 22(A) > 1 for every A. The first relation indicates that it is hardly possible 
for both A and A to be well supported, and the second one is interpreted as 
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either one of A and A or possibly both must be highly plausible. 

The belief functions described in this section are considered to be based 
on the Likelihood Principle because they are expressed in terms of likelihood 
functions, eq. (3.3.2), (3.3.3), (3.3.12), and (3.3.14). They are obtained by 
transforming the assessment of statistical evidence already in the form of point- 
valued likelihood functions into interval-valued probability models. 


Table 3.1. Consonant and Partially Consonant Belief Functions 
based on a Single Observation. 



Consonance 

Partial Consonance | 

A 

Pa(* ) 

OTX 

•SPx 


m x 

SPx 



0.50 

0.4 

0.4 

1.0 

0.31 

0.31 

0.77 


0.30 

0.0 

O 

O 

0.6 

0.00 

0.00 

0.46 


0.15 

0.0 

0.0 

0.3 

0.15 

0.15 

0.23 

{G) 4 } 

0.05 

0.0 

0.0 

0.1 

0.00 

0.00 


{tOi.tOs} 

0.80 

0.3 

0.7 

1.0 

0.46 

0.77 

0.77 

h.wal 

0.65 

0.0 

0.4 

1.0 

0.00 

0.46 

1.00 

{<0l, co 4 ) 

0.55 

0.0 

0.4 

1.0 

0.00 

0.31 

0.85 

{<02- 0^} 


0.0 

0.0 

0.6 

0.00 

0.15 

0.69 

{C^, (0 4 ) 

0.35 

0.0 

0.0 

0.6 

0.00 

0.00 

0.54 

{<% <o 4 ) 

0.20 

0.0 

0.0 

0.3 

0.08 

0.23 

0.23 

{<0-|, G^.G^} 

0.95 

0.2 

0.9 

1.0 

1BI 

0.92 

1.00 

{(0-), C02,C0 4 } 

0.85 

0.0 

0.7 

1.0 

0.00 

0.77 

0.85 

{<0i. CO3, <o 4 ) 

0.70 

0.0 

0.4 

1.0 

0.00 

0.54 

1.00 

{ C^, CO3, (0 4 ) 

0.50 

0.0 

0.0 

0.6 

0.00 

0.23 

0.69 

Q i 

1.00 

0.1 

1.0 

1.0 

0.00 

1.00 

MEW 
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3.4. Representation of Source Reliability 

Since information sources in remote sensing and GIS are in general not 
equally reliable, they usually provide various degrees of support for an event. 
In order to incorporate a relative quality factor, so-called “degree of reliability,” of 
individual data sources into the combination of multiple evidence, reliability 
should be represented quantitatively. Although the belief functions in the form 
of IV probabilities are useful to represent the uncertainty in describing the 
degrees of support for individual events, they do not take into account the 
relative source reliability representing a body of evidence as a whole. 

As a simple example, consider a problem of classifying a pixel using two 
data sources as depicted in Figure 3.1 . Let X ^ and X 2 be the vectors of the pixel 

obtained from Source 1 and Source 2 respectively. Based on Source 1 alone, 
the pixel seems to belong to while according to the other source it is more 
likely to come from co 2 . N there is a priori information concerning how reliable 
each data source is, it would be reasonable to make a decision on the 
classification of the pixel using the source reliabilities as well as the 
probabilistic information from both sources. 

Benediktsson and Swain (1989) have used three statistical measures, 
overall classification accuracy, weighted average separability, and 
equivocation, to quantify reliability of sources in the classification of multisource 
data. Which measure should be applied to a particular problem depends on the 
meaning of the reliability of a source in the context of the problem, that is, the 
sense in which the source is called reliable. For the problem of multisource 
data classification, it is quite natural that a source is called reliable when it gives 
higher classification accuracy. Measuring reliability of a source based on 
classification accuracy is straightforward. It is usually computed from the overall 
classification accuracy over a representative set of training samples. 

A statistical separability measure such as Jeffries-Matusita (J-M) 
distance, Bhattacharyya distance, or (Transformed) Divergence is an alternative 
to the numerical representation of source reliability assuming that a data source 
provides higher classification accuracy when information classes are more 
separable in the source. For example, the J-M distance defined as follows is a 
measure of statistical separability of pairs of classes: 
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-?ij={ J[ V/KXIcoj) - V^XIcoj) ] 2 dX } 2 (3.4.1) 

X. 

where /j(X|coj) is the probability density function of class coj. When each class is 
assumed to have a normal density function (i = 1, the above 

equation is reduced to 


\ = ^2(1 - exp(-pjj)) (3 4 2) 

where {3jj is the Bhattacharyya distance between coj and coj defined as: 

|(Z|+Z|) 

[ 2 

VSFiq 

The average J-M distance over all class pairs is given as: 

i =n j=/7 

-?av=X X p (cOj) p (coj) J?jj (3.4.4) 

i=1 j=1 

where P(coj) is the prior probability of Wj. 

For the normal distribution case, Transformed Divergence between co ( 
and o)j is defined as: 

= 2 [ 1 - exp(-^) ] (3 4 5) 


(3.4.3) 


Pij - 8 ( M i _ M j) T f ~2 0 - M j) + \ log e [ 


where 

2),j = 2 tr[(Zj - 1 - Zj ’)] + | tr[(Zj 1 + Zj r1 )(Mj - Mj)(Mj - Mj) 7 ] (3.4.6) 


Then the average Transformed Divergence over all class pairs is given as 


®!,v=X t P(0J|) Pfcoj) ©'jj 
i=1 j=1 


(3.4.7) 


Equivocation is the class separability measure corresponding to 
Shannon’s entropy measure [Devijver and Kittler (1982)]. Benediktsson et al. 
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(1989) use equivocation to measure the reliability with which classes 
identifiable by means of each data source can be used to identify the 
information classes of interest in a given application. 

The three measures briefly reviewed above are related indirectly to the 
classification accuracy of the source. The source reliability can have a little 
different meaning in the mathematical framework of the theory of evidence. In 
the previous example of Figure 3.1, assume that Source 1 is a main data source 
and Source 2 an ancillary data source, and that the main source gives higher 
classification accuracy over training samples. Then Source 2 can be 
considered as reliable as Source 1 if there is little overall conflict between them 
in providing evidence for classifying observations. And its reliability will 
decrease according to the extent of conflict with Source 1. The following 
definition gives a notion of quantifying source reliability based on a measure of 
the extent of the conflict between the belief functions provided by two entirely 
distinct bodies of evidence. 

Definition 3.4. [Shafer (1976a)] Assume that BeC^ and Bef 2 are belief 
functions provided by two bodies of evidence. Let m i and ^denote the basic 
probability assignments of Be(^ and Be^, respectively. The measure of conflict 
between BeC^ and Be^ is defined as: 

X m i( A i)' m 2( B j) (3.4.8) 

AinBj=0 

£ is a fraction between 0 and 1. When BeC^ and BeC^ have no conflict, 
=0. If they are completely contradictory, £=1. After £is computed for every 
pixel, the average measure of conflict between the sources is obtained as: 

3C=E[£]= P d£ (3-4.9) 

Jo 

where p(jQ is the probability density function of £. 

In order to illustrate their uses and compare the performances, the 
average J-M distance (J7 av ), the average Transformed Divergence (2^ v ), and the 
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average measures of conflict between pairs of sources in the Anderson River 
data set were computed. The data set has 6 sources as shown in Table 3.2. 
For more detail about this data set, see Section 6.2. For this experiment, six 
information classes are defined. Each class has 100 training samples uniformly 
scattered over the test fields. The first row in Table 3.2 shows the overall 
classification accuracy (OCA) over the training samples using the Maximum 
Likelihood classification. Although most of the classes are not normally 
distributed in the topographic data sources (see Figures 6.9 through 6.12), they 
were assumed to be so in the calculations. The maximum values of _? av and 

are V2 and 2, respectively. When they are directly used as measures of source 
reliability, they should be divided by the corresponding maximum value so that 
their maximum is 1. Table 3.2 shows that the separability measures agree with 
the overall classification accuracy in ranking the sources for their relative 
reliabilities. Based on the measures in Table 3.2, the sources can be ranked 
from best to worst as A/B MSS, Elevation, SAR-Shallow, SAR-Steep, Aspect, 
and Slope. 


Table 3.2 Overall Classification Accuracy (OCA), Average J-M Distance 
Uav)> and Average Transformed Divergence (2^ v ) of Sources in 
Anderson River Data Set (Training Samples). 



A/B MSS 

SAR 

Shallow 

SAR 

Steep 

Aspect 

Elevation 

Slope 

OCA (%) 

83.5 

34.7 

33.5 

30.3 

45.8 

29.2 


1.09 

.57 

.49 

.35 

.66 

.21 

av 

1.58 

.52 

.40 

.32 

.82 

.08 


The average measures of conflict between pairs of sources in the same 
data set were computed for the training samples and the combined training and 
test samples, and the results are listed in Table 3.3 and 3.4, respectively. The 
type of the belief function used was the consonant belief function. Since the 
probability density function of £in eq. (3.4.8) was not known, the histogram 
approach was used to estimate p(^). The results show that Elevation and SAR- 
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Shallow sources have less conflict with A/B MSS in providing bodies of 
evidence, compared to the remaining sources. Knowing that A/B MSS source 
gives the highest overall classification accuracy, relative degrees of reliability of 
the other sources can be assigned according to their measures of conflict with 
A/B MSS such that the less conflicting, the more reliable. Thus the sources can 
be ranked from best to worst as A/B MSS, Elevation, SAR-Shallow, Aspect, 


Table 3.3 Average Measures of Conflict between Pairs 
of Sources in Anderson River Data Set. 

(Using Consonant Belief Function with Training Samples) 



SAR 

Shallow 

SAR 

Steep 

Aspect 

Elevation 

Slope 

A/B MSS 

.388 

.586 

.543 

.327 

.565 

SAR 

Shallow 


.269 

.387 

.429 

.404 

SAR 

Steep 



.436 

.437 

.341 

Aspect 




.588 

.463 

Elevation 





.543 


Table 3.4 Average Measures of Conflict between Pairs 
of Sources in Anderson River Data Set. 

(Using Consonant Belief Function with All Samples) 



SAR 

Shallow 


HU 

Elevation 

■■ 

A/B MSS 

.407 

.585 

.538 

.351 

.550 

SAR 

Shallow 


.284 

.385 

.453 

.385 

SAR 

Steep 



.437 

.462 

.344 

Aspect 




.572 

.428 

Elevation 



. 


.513 
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Slope, and SAR-Steep. The average measure of conflict agrees with the 
separability measures and OCA only in ranking the first three sources (A/B 
MSS, Elevation, and SAR-Shallow). In the multisource data classification with 
this data set, the remaining sources (SAR-Steep, Aspect, and Slope) will be 
considered as equally reliable as the 4th. 

There are two problems in quantifying source reliability based on the 
average measure of conflict. First, the values of the average measures of 
conflict will vary depending on what kind of belief function is used in eq. (3.4.8). 
However, as long as the belief function represents the body of evidence 
properly, the ranking of the sources in terms of their relative reliabilities will 
remain the same. Second, even the ranking of the sources depends on the 
prior information regarding which is the most reliable source. For example, in 
Table 3.4, if SAR-Shallow were assumed to be the most reliable, then the 
second most reliable source would be SAR-Steep instead of A/B MSS. 

One of the advantages of the measure of conflict is that it provides the 
relative reliabilities between all pairs of sources. When the “most reliable 
source changes from one to another due to the meaning of the reliability in the 
context of a problem, the measure of conflict gives the ranking of the sources 
according to the new most reliable source. 

Furthermore, the measure of conflict can be computed for test samples as 
well as training samples. In the above case, there is not much difference 
between the measures of conflict for the training samples and the entire sample 
because the training samples are uniformly distributed over the entire sample. 
On the other hand, when training samples are limited and poor representatives 
of test samples, there may be difference between the measures of conflict 
obtained from the training samples and from the entire sample. 

Both the separability measures and the measure of conflict give 
information for ranking multiple sources in the sense of their relative reliabilities, 
but a quantitative method of computing the absolute reliabilities of the sources 
is still unknown. 

Once the relative reliabilities of the data sources are given, they are 
included in the multisource data analysis by “discounting” belief functions 
[Shafer (1976a)]. Suppose a denotes the relative reliability assigned to a given 
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source, where 0<a<l. By discounting, the basic probability number of every 
subset A of £2 is reduced from m( A) to a-m(A) and the basic probability number 
of Q. increases from m(Q) to m(Q)+ a. 


3.5. Summary 

This chapter has focused on the construction of interval-valued belief 
functions for statistical evidence and the quantitative representation of source 
reliability. Belief functions can be obtained in the form of IV probabilities from 
the supremum and infimum of a class of probability measures. Two models for 
robust estimation of probability measure, the e-contamination model and the 
total variation model, were introduced to formalize the class of probability 
measures. Then the IV belief functions based on the Likelihood Principle were 
constructed. Although they require some underlying assumptions (consonance 
or partial consonance), they have mathematically simple and readily usable 
formulas. The required assumptions are not difficult to satisfy in practical 
applications of this approach. 

In order to include the relative reliabilities of sources in a multisource 
data analysis, the attempts to quantitatively represent the degree of reliability by 
the average Jeffries-Matusita distance, the average Transformed Divergence, 
and the average measure of conflict between pairs of sources were made. 
Their performances were compared by applying them to an actual multisource 

data set. 

In the experiments described in Chapter 6, the belief functions based on 
the Likelihood Principle will be implemented, and the multiple sources will be 
ranked based on the average J-M distance and the average measure of conflict. 

In this chapter, the contribution of this research is in the representation of 
statistical evidence by IV probabilities such as consonant and partially 
consonant IV probabilities. Theorems 3.1 and 3.2, Definition 3.3, and Lemma 
3.3 are originally formulated and proved in this report. 
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CHAPTER 4 

COMBINATION OF BELIEF FOR STATISTICAL EVIDENCE 


4.1. Introduction 

To base inferences and decisions on ail available information, it is 
necessary to combine the information from various sources. The role of rules 
for combining evidence is to integrate the conditional knowledge about states of 
nature based on each body of evidence into combined knowledge based on the 
total evidence. Combination rules may be formulated in various ways; they may 
depend on the characteristics of the problem, the experience of the knowledge 
engineer, and the mathematical theories on which the rules are founded. 

Various procedures for the formation of a consensus of opinions have 
been suggested in the group decision problems [French (1981), Genest (1986), 
and Winkler (1968)], some on pragmatic grounds, others justified axiomatically. 
The following formulas are most typical ones among them. 

Consider the situation where there are m sources of information, each 
providing its subjective probability 7tj (i=1, over ‘B. Here k\ can be any kind 
of additive probability measure according to the context of problems. 

Linear Opinion Pool defines the overall probability measure it as a 
weighted mean of 7tj's: 


m 

3C(A) = 2,7i*TCj(A) for all Ae# (4.1.1) 

i=i 

where Yi (i = 1 , .... m) are positive weights assigned to each source and 

m 

satisfying ^ = 1 . 
i=1 
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Independent Opinion Pool assumes that the information sources are 
“independent” and defines the overall probability measure simply as a product 
of the individual measures: 


r 77 

71(A) = K- in jtj(A)] for all As tB (4-1.2) 

i=1 

where k is an appropriate normalizing constant so that n(-) become additive. 

Logarithmic Opinion Pool is a generalization of the independent opinion 
pool. The overall probability measure is given as: 

m 

71(A) = K- trit ^(A)}®* ] for all Ae (B (4. 1 .3) 

i=1 

where a, is any positive real number representing the relative reliability of the i* h 
source. 


A deficiency of the linear opinion pool is that the individual probabilities 
do not reinforce the others. The combined measure given in (4.1.1) is always 
between the maximum and the minimum values of K t , 


min 7 c(A) < 7c(A) < max jt.(A) 

'=1 m i i=l m r ' 


for all As ‘B 


(4.1.4) 


The other two schemes have the “zero probability property”, viz., 

If 7ij(A) = 0 for any i, then tc(A) = 0 (4.1.5) 

which makes the combined measure too sensitive to a small probability 

measure. More in-depth discussions are found in French (1985) and Beraer 
(1985). 

In rule-based inferencing systems, several subjective Bayesian updating 
rules have been proposed to modify the probabilities of hypotheses as each 
piece of evidence is provided. These rules are derived by applying one or two 
statistical independence assumptions to Bayes’ rule and successfully used in 



42 


rule-based expert systems such as PROSPECTOR [Duda et al. (1979)] and 
MYCIN [Shortliffe (1976)]. However, there have been some controversies over 
the inconsistency between the independence assumptions and their updating 
rules. 

During the last decade Dempster’s rule has been receiving more 
attention from many researchers in various areas of science and engineering. It 
is a generalization of Bayesian inference, including the subjective Bayesian 
updating rules as the special cases for which the domain-specific knowledge is 
precise. 

The objective of this chapter is to investigate the inferencing mechanisms 
of the subjective Bayesian updating rules and Dempster’s rule in combining 
multiple evidence when they are formulated as set-theoretic functional 
equations. They are given a behavioral interpretation in terms of the desirable 
properties which agree with human intuition. The independence assumptions 
underlying them and the robustness to small variations in probability measures 

are studied. 


4.2. Properties of Combination Rules 

For computer-based, quantitative techniques of multisource data analysis 
the rules for combining evidence must be formulated as functional equations 
computing the degree of belief based on the total evidence from degrees of 
belief based on each single piece of evidence. 

As given earlier, Q consists of a finite number of exhaustive and mutually 
exclusive events and <B is a Boolean algebra of all subsets of Cl. Let £ be a set 
of multiple bodies of evidence { E 1f E 2 , .... E m } and S(Aj|Ej) = b { (i=1, .... m) 
denote the degree of conditional belief for Aj e given a body of evidence Ej. 

Then a rule for combining evidence expresses the degree of belief based on 
the total evidence, B(A j |E 1 &E 2 &...&EJ, as a function on the set of evidence 
given the knowledge of B(A j |E i ) for i=1, .... m. Several properties of combining 
rules are proposed by Cheng and Kashyap (1986) to provide guidelines for 
constructing the rules as numerical formulas. In this section those properties 
are formally stated. 
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Definition 4.1. Let 7 denote a function representing a rule for combining 
evidence. 7" is said to be “decomposable” if there exists a function / such that 

.... b m ) =/(/( -../(/(bi, £>2), 63), ...), b m ) (4.2.1) 

where /is called a “binary operator” of 7 . 

In general, '/ and/ (if it exists) are assumed to be continuous except at 

the endpoints. This corresponds to the idea that the human reasoning process 
is not abrupt. 

If we assume that the final degree of belief depends only on the set of 
evidence and not on the order in which the pieces of evidence are combined, 
different orderings of evidence in combination should produce the same result. 
The properties in the following definitions are essential to any combination rule 
for exchangeability of the order of evidence and for decomposability of its 
numerical function into a binary operator. 

Definition 4.2. 7 is “commutative” if it has a binary operator / such that 

/(bi, bj) =/(^, b,) (4.2.2) 

for any pair of i, j (1 <i,j<m). 

Definition 4.3. 7 is “associative” if it has a binary operator/ such that 

f(f(M> l ),b k )=f(b l ,f(b i ,b k )) (4.2.3) 

for all i, j, and k (1 < i, j, k < m). 


In every numerical representation of belief, a stronger belief is 
represented by a larger number. Imagine that two degrees of belief provided by 
different pieces of evidence, say b, and b r are to be combined respectively with 
another degree of belief b k . Suppose b, > b, i.e„ b, represents a relatively 
stronger belief than b,, then it is natural that the combination of b, with 6 k 
produces a larger number than the combination of bj with i*. The next definition 
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gives the mathematical expression of this property. 

Definition 4.4. iT is said to be “monotonous" if its binary operator / satisfies 
the condition 

ifbj>bj, then/(bj, t\) >/(bj, t\) (4.2.4) 


for any l\. 

Monotonicity is a rather general property compared to commutativity and 
associativity because it should hold even for combining functions which do not 
have binary operators. It is true that when one piece of evidence is replaced by 
one providing stronger belief, ^should produce a larger value. 


Definition 4.5. / is “positively reinforcing” if 


7{b, b m ) > 


max {b} 

i « 1 /n 1 


or its binary operator / satisfies 

/(*>„ bj) > max { b„ fr,} 


Definition 4.6. / is “negatively reinforcing” if 


/(b, b m ) < 



or its binary operator/ satisfies 

/(*>„ b j) < min { b\, b,} 


(4.2.5) 


(4.2.6) 


(4.2.7) 


(4.2.8) 


Positive (Negative) reinforcement means that the belief based on the total 
evidence is stronger (weaker) than the belief based on any single piece of 
evidence. 

In the following two sections, the definitions of desirable properties of a 
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combination rule play a role of interpreting inferencing mechanisms of the 
subjective Bayesian updating rules and Dempster’s rule of combination. 


4.3. Subjective Bayesian Updating Rules 

The three different subjective Bayesian updating rules have been 

obtained by applying one or two statistical independence assumptions to 
Bayes’ rule. 

Global independence over £ = { E v E 2 E m } is defined as: 

m 

P(E 1 &E 2 &... &E m ) = [Jp(E i ) (4.3.1) 

i=1 

Conditional independence over £ given a proposition is defined as: 

m 

P(E 1 &E 2 &. . . &E m | Aj) = rjP(Ej|A:) for all j=1 /? (4 3 2) 

i=1 ' 

Conditional independence over £ given the negation of a proposition is 
defined as: 


_ m 

P(E 1 &E 2 &... &EJAj) = JfJ P(Ej|Ap for all j=1 n 

i=1 


(4.3.3) 


Using Bayes’ rule, the posterior probability of Aj given the combined 
body of evidence can be written as 


P(Aj I E 1 &E 2 &... = 


P(E 1 &E 2 &...&E m | Aj)P(Aj) 
P(E 1 &E 2 &.. .&EJ 


(4-3.4) 


Under the assumption of conditional independence in eq. (4.3.2), the 
Bayes formula in eq. (4.3.4) can be written as: 
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r<A,t n 


a P(Aj I Ej) 


P(A:| E^EgA... &EJ = - 


i=1 


P(A:) 


(4.3.5) 


f "L P(A k | E) 

.?>**> n 


This rule has been used by Cheng and Fu (1985) in a rule-based reasoning 
system for diagnosing diseases. 

The global independence assumption in equation (4.3.1) together with 
the conditional independence in equation (4.3.2) rewrites Bayes' rule as 


P(Ej | Aj) £ P(A|JJEj) 

P(A j |E 1 &E 2 &...&E m ) = n P (e.) = 1 1 


P(A:) 


(4.3.6) 


Swain et al. (1985) have used this formula to construct a global membership 
function. Also, the rule for combining measures of belief and disbelief in MYCIN 
has been obtained from the binary form (m= 2) of eq. (4.3.6) after translating 
probabilities to its own measures of belief and disbelief. 

Also, applying both conditional independence assumptions to Bayes 
rule, we can derive the following combining function 

m 

n p < A i i E i> 

P(A i |E 1 &E 2 &...&E m ) = -S — m I (43 ' 7) 

n p< A i i E,) + n p < A i i E i> 

i=1 i=1 

which is the updating rule used in PROSPECTOR, a rule-based computer 
consultant system intended to aid geologists in evaluating the favorability of an 
exploration site for occurrences of ore deposits of particular types. Interestingly, 

this rule is a special case of the rule in eq. (4.3.5) when P(Aj) = ^ for all j. 

Nevertheless, it is more appealing because this rule expresses the combined 
measure in terms of only the conditional probabilities of individual bodies of 
evidence. Note that the rules expressed in eq. (4.3.5) and eq. (4.3.6) include 
the effect of prior probabilities in combining bodies of evidence. 
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All of the subjective Bayesian updating rules described in this section are 
decomposable. The binary operator of each rule can be easily obtained by 

setting m = 2. In the following, we will take a closer look at the characteristics of 
the rule expressed in eq. (4.3.7). 

For a subset A of Q, set P(A|E 1 ) = Pl and P(A|E 2 ) = p 2 . Since P( ) is 

additive, P(A|E i ) = 1- Pj for i =1, 2. The binary operator of the rule in equation 
(4.3.7) is given as: 

/a(Pi> P 2 ) (= P(A | E 1 &E 2 )) = ^2 

Pi-p 2 +(i-p 1 )-(i-p 2 ) (43 ' 8) 

The above binary operator has the following properties: 

(1) Positively reinforcing when p v p 2 >± and negatively reinforcing when Pl , 
P 2 - 2 • Not defined in terms of reinforcement when Pl < \ and Pp > - or p 
>2 and p 2 <2- 

(2) When p, = j ■ /a<Pi • P 2 ) ■ P 2 : \ is the identity of the binary operator. Since 

the rule deals with additive probabilities, £ represents the total ignorance of 
evidence for the rule. 

(3) When Pl = 0 (or 1), / A ( Pl , p 2 ) = 0 (or 1) except p 2 = 1 (or 0); 0 and 1 are the 
annihilators of the binary operator, that is, when E 1 provides complete 

certainty either for A ( Pl = 1) or for A ( Pl = 0), the other body of evidence 
cannot affect the combined belief measure. 

(4) / a ( 0, 1) and f A ( 1, 0) are not defined; this rule cannot combine two bodies of 
evidence which are completely contradictory. 

Figure 4.1 is a graphical interpretation of the binary operator based on 
set-theoretic operations. In the figure, the upper-left rectangle represents the 
degree of belief for A based on the combined evidence while the lower-right 
rectangle represents the degree of belief against A based on the combined 
evidence. The upper-right and lower-left rectangles represent the measure 
which fails to be committed to either A or A. 

The question now is which independence assumption is empirically 
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P(A|E 2 ) = p 2 P(A|E 2 ) = 1-p 2 






AnA = A 

AnA = 0 

P(A|E 1 ) = p 1 

Pl'P 2 

p, d-p 2 ) 


AnA = 0 

AnA = A 

P(A|E 1 ) = 1-p i 

d-pp p 2 

(1- Pi )- (1-p 2 ) 


Figure 4.1 Graphical Interpretation of Binary Operator of Subjective 
Bayesian Updating Rule in Equation (4.3.7) 
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more reasonable and yields a better updating scheme. Controversially, it has 
been shown that there is inconsistency between some independence 
assumptions and their updating rules. We will begin the discussion with the 
following lemmas which were stated and proven by Pednault et al. (1981), and 
Johnson (1986), respectively. 


Lemma 4.1. If Q consists of n (n > 2) mutually exclusive and exhaustive 

n 

propositions, i.e., if ^ P(A^) = 1 and P(Aj & Ap = 0 for i * j, then equations (4.3.2) 
and (4.3.3) together imply equation (4.3.1). 


When n - 2 ( Q _ { a, A }), the above lemma does not hold 


Lemma 4.2. If Q consists of n mutually exclusive and exhaustive propositions 
where n > 2, and if equations (4.3.2) and (4.3.3) are assumed, then there is at 
most one piece of evidence that produces updating for the proposition. 

Lemma 4.2 says that under the above conditions regarding Q, at most one 
piece of evidence can alter the probability of any given proposition; thus, 
although updating is possible, multiple updating for any of the propositions is 
impossible. The following lemma is from Cheng et al. (1986). 

Lemma 4.3. Suppose that Q - { A, 5 ). If equations (4.3.1), (4.3.2), and (4.3.3) 

are assumed, then there is at most one piece of evidence that produces 
updating for each proposition. 


As a consequence of the above lemmas, in order for probabilities of two 
or more mutually exclusive and exhaustive propositions to be updated and 
allow multiple pieces of evidence to influence a decision, one of the conditional 
independence assumptions should be eliminated. In fact, Charniak (1983) and 
Johnson recommend the updating scheme in eq. (4.3.5) for inference about any 
number of mutually exclusive and exhaustive propositions. 
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4.4. Dempster’s Rule of Combination 

Dempster’s rule is a generalized scheme of Bayesian inference to 
aggregate bodies of evidence provided by multiple information sources. Let m 1 
and be the basic probability assignments associated respectively with the 
belief functions and which are inferred from two entirely distinct bodies 
of evidence E 1 and E 2 . For all Aj, Bj, and X k cQ, Dempster’s rule (or 

Dempster’s orthogonal sum) gives a new belief function denoted by 

<BeC= VeC] © (Be^ (4-4.1) 

The basic probability assignment associated with the new belief function is 
defined as: 

m(X k ) = ( 1-^) -1 X m i( A i)' m 2( B j) ( X k^0) (4-4.2) 

AjnBj=Xk 

where £ is the measure of conflict between (Be £, and r Bet 2 , as defined in 
Definition 3.4. 

Dempster’s rule computes the basic probability of X k , m(X k ), from the 
product of (Aj) and Bj) by considering all Aj and Bj whose intersection is 
X k . Once mis computed for every X k c Q, the belief function is obtained by the 
sum of m's committed to X k and its subsets. The denominator (1 -Q normalizes 
the result to compensate for the measure committed to the empty set so that the 
total probability mass has measure one. Consequently, Dempsters rule 
discards the conflict between E 1 and E 2 and carries their consensus to the new 

belief function. 

There are several points of interest with regard to this rule. First, it 
requires that the basic probability assignments to be combined be based on 
entirely distinct bodies of evidence and refer to the same frame of discernment 
Q. Secondly, it is both commutative and associative. Therefore, the order or 
grouping of evidence in combination does not affect the result, and a sequence 
of information sources can be combined either sequentially or pairwise. Finally, 
£in the above equation is the measure of conflict between E, and E 2 , which 

represents the amount of the total probability that is committed to disjoint (or 
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contradictory) subsets of Q. If £is equal to one, this means that E 1 and E 2 are 

completely contradictory and the orthogonal sum of their basic probability 
assignments does not exist. 

To exhibit the properties of Dempster’s rule, suppose that there are only 

two focal elements A and A in £2 and the basic probability assignment tm based 
on Ej is given as: 

mj(A)=p j( mj( A) = qj, mj(Q) = 1 — p.— q. fori = 1,2 (4.4.3) 

where p f + q s < 1 , i.e., they are non-additive. 

Then, the respective interval-valued belief function given Ej(i=1,2) 

supports A with [ Pj , 1-q.], and A with [q., 1-p.]. Dempster’s rule produces the 
new basic probability assignment m, and by equation (2.2.9) the support 
function for A and A based on the total evidence is given as: 


Sp(A\E,&E 2 ) _-Pl p 2 + Pi (j, P 2 Pi qi) P? 

1 -prq 2 -q 1 p 2 

_ 1 O-PiMi-Pp) 

^Pi^-qrPs 


(4.4.4) 


sp ( a |e 1 &e 2 ) _ Jji q 2 +q i il p 2 Pi qi)-q? 

1 -Pi-q 2 -qrp 2 

o-q^o-q,) 


Figure 4.2 shows the graphical interpretation of Dempster’s rule for the above 

case. The probability mass committed to Q represents the uncertainty 

concerning the support for A and A. The conjugate plausibility function W\s 

obtained by equation (2.2.14). In general, Dempster’s rule has the following 
properties: 


(1) Commutativity and associativity. 

(2) [5p, 2YJ©[0, 1] = [Sp, f P!\\ [0, 1] plays the role of identity for the rule. 
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" 12 (A) = p 2 

" 12 (A) = q 2 

tti 2 (Q) = 1 p 2 q 2 


AnA = A 

AnA = 0 

AnQ = A 

J 

> 

II 

y 

Pi'P 2 

Pi ^2 

pr(i-p 2 -q 2 ) 


AnA = 0 

AnA = A 

AnQ = A 

™ i(A) = q 1 

qrp 2 

CM 

cr 

y 

q 1 -(1-p 2 -q 2 ) 


QnA = A 

QnA = A 

Qn£2 = Q 

m-i(Q) = 1-p 1 -q 1 

(i-Prqi)*p 2 

(i-p 1 -q 1 )*q 2 

(i — P-| — q-|)'(i -p 2 _ q 2 ) 


Figure 4.2 Graphical Interpretation of Dempster’s Rule when Q = { A, A } 
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(3) When p j+qj _ i, j.e., they are additive, equation (4.4.4) is equal to equation 
(4.3.8), and the resulting belief function becomes additive. 

(4) For any interval [Sp, ®M0, 0], [Sp, ®J©[1, 1]=[1,1], and for any interval [<*, 
^^[1. 1]. [Sp, P/]®[0, 0]=[0, 0]; [0, 0] and [1, 1 ] are annihilators for the rule. 

(5) [0, 0]©[1, 1] is undefined; Dempster's rule cannot combine completely 
conflicting bodies of evidence. 

(6) The combined interval is no wider than any interval to be combined, i.e., 

O-Pi-qiHI- Po-q,) 

1 -PTq 2 -qrp7~ $1 " p r q i fori = 1 ’ 2 (4A6 > 

Since the width of an interval-valued belief measure corresponds to the 
measure of uncertainty, it seems intuitively reasonable that the value of the 
measure of uncertainty decreases as the amount of evidential information 


The only condition that Dempster’s rule requires is that the bodies of 
evidence to be combined must be entirely distinct. In the context of the problem 
o multisource data classification, combining entirely distinct bodies of evidence 
is considered as a fusion of the individual observations provided by 
independent sensors. The meaning of independence here is that an 
observation from one sensor does not have any effect on an observation from 


4.5. Robustness of Combination Rules 

The previous two sections described the functional characteristics of the 
sub/ective Bayesian updating rules and Dempster’s rule in terms of the 
esirable properties of combination rules. In this section, the binary operators 

° , ^ PS,er ’ S rule <eq ' (4A4 » and a subjective Bayesian updating rule (eq 
(4.3.8)) are compared with respect to their sensitivity to small changes of the 
initial belief measures to be combined. 
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Suppose we are classifying a pixel denoted by a vector X into one of a 
set of mutually exclusive and exhaustive classes, io 2 , and o) 3 , based on two 
independent data sources. Let E-) and E 2 denote the bodies of evidence 
provided by the two data sources, and C2 = {A 1( A 2 , A 3 } denote the frame of 
discernment, where Aj represents the event of X being classified to co v 
Suppose that the basic probability assignment numbers based on each data 
source are given as: 

m 1 (A 1 ) = 8, tti\ (A 2 ) = 1-8-p, m 1 (A 3 ) = p ( 4 -5-1) 


and 

m^A-i) = 1-5-p, m 2 (A 2 ) = 5, m 2 (A 3 ) = p (4.5.2) 

Note that the above measures are additive, i.e., there is no measure of 
uncertainty. Hence, both data sources are believed to be completely reliable, 
and the information provided by the data sources is assumed to be exact and 
precise for representing the belief measures. 

When 5 = 0 and 0 < p « 1 , there is strong conflict between the bodies of 
evidence provided by the data sources. The only agreement between them is 
that A 3 is highly improbable. In other words, X is hardly believed to belong to 
to 3 . On the contrary, the equation (4.3.8) - recall that it is a special case of 
Dempster’s rule when the belief measures are additive - yields the combined 

measures as: 

m(Ai| E 1 &E 2 ) = m(A 2 | E 1( &E 2 ) = 0, m(A 3 | E 1 &E 2 ) = 1 

without regard to the value of p. The result expresses that co 3 is the only 
possible class for X, which is completely against our intuition. 

Now, in order to examine how sensitive the combination rule is to slight 
changes of initial measures, let 5 be a non-zero small number. Then, we find 

8(1— 5— p) 

m(Ai\ E-|&E 2 ) = m(A 2 | Hi &E 2 > = 2 5(1 — 5— p) + p 2 

P 

25(1-5-p) + p 2 


m(A 3 | E-|&E 2 ) = 


(4.5.4) 
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Table 4.1 shows the results of the equation (4.3.8) for various small values of 5 
when p = 0.1. 


Table 4.1 Result of Combination by Dempster’s Rule for 
Additive Belief Measures. 



p = 0.1 


5 = 0.001 

6= 0.01 

5 = 0.05 

m( A-,1 E. 1 &E 2 ) 

0.076 

0.320 

0.447 

m(A 2 | £^£ 2 ) 

0.076 

0.320 

0.447 

m(A 3 | Ei&E 2 ) 

0.848 

0.360 

0.106 


By comparing the combined measures for 5 = 0.001 and 0.05, we can draw a 
conclusion that the extreme sensitivity may lead to totally different decisions 
when the numerical representation of belief is coarse. Recall that the measures 
of belief in the above example are additive. Will Dempster’s rule show such 
sensitivity when the measures of belief are subadditive? 

When the data sources are not completely reliable, which is true in most 
cases of real world data sources, the belief measures based on the partially 
reliable sources include the measure of uncertainty. Suppose both data 
sources are assigned the same amount of measure of uncertainty a, that is, 

m 1 (Q)= m 2 (Q) = a 

where 0 < a < 1. a is assigned to the frame of discernment Q to represent the 
partial ignorance of belief based on the incomplete data sources. Then, the 
initial measures in (4.5.1) and (4.5.2) which were additive are reduced as: 

m i(A-|) = (l-a)5, m-|(A2) = (l-a)(l-5-p), m](A^) = (l-a)p (4.5.5) 

and 

"^(Ai) = (l-a)(l-5-p), m 2 (A 2 ) = (l-a)8, m 2 (A 3 ) = (l-a)p (4.5.6) 

Now, the belief measures become non-additive, and they are represented in 
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terms of interval-valued probabilities in Table 4.2. In this particular case, since 
all the focal elements are singleton, the width of their IV belief measures is the 
same. 


Table 4.2 Interval-valued Belief Measures after Combination 
by Dempster’s Rule for Non-additive Belief Measures. 



E 

1 

E 

2 


Sp 

<2C 

Sp 

<21 

B 

(l-a)8 

(l-a)8+a 

(l-a)(l-S-p) 

(l-a)(l-8-p)+a 

R 

(l-a)(l-8-p) 

(1— tt)(l— 8— p)+ci 

( l-a)8 

( 1— g)8+g 


(l-a)p 

(l-a)p+a 

(l-a)p 

(l-a)p+a 


Dempster’s rule yields the new basic probability assignment as: 

(1-g){8(1-a)(1-S-p)+a(1-p)} 


m(A-|| E 1 &E 2 ) = m(A 2 | E-| &E 2 ) 


m( A 3 | Ei&E 2 ) = 


(1 - Q 

r(1-a){(1-a)p+2a} 

( 1-/0 


and 


m(Q\ E)&E 2 ) = 


a* 


( 1-/0 


(4.5.7) 

(4.5.8) 


(4.5.9) 


where £= (l-a) 2 {l-p+28(p+8-1)}. 

Let a = 0.1, which means that the data sources are highly reliable but still 
incomplete. For 8 = 0 and p = 0.1 , the combined measures are: 

A-|| E 1 &E 2 ) = mi A 2 | E 1 &E 2 ) = 0.409 , m(A 3 | E 1 &E 2 ) = 0.132 
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Compared to those which are additive, the non-additive measures, after being 
combined by Dempster’s rule, are more in accordance with human intuition. 
Table 4.3 shows the results of Dempster’s rule combining non-additive 
measures for various small values of 5. 


Table 4.3 Result of Combination by Dempster’s Rule for 
Non-additive Belief Measures. 



Q 

II 

p 

x> 

II 

0 

6 = 0.001 

6= 0.01 

5 = 0.05 

/rt(Ai| E. 1 &E 2 ) 

0.409 

0.414 

0.432 

m( A 2 | £^£ 2 ) 

0.409 

0.414 

0.432 

^Agl E 1 &E 2 ) 

0.131 

0.123 

0.098 

niQ . | E,&E 2 ) 

0.051 

0.049 

0.038 


By assigning a small amount of uncertainty to the data sources, we can avoid 
the extreme sensitivity of Dempster’s rule to slight changes of measures 
provided by conflicting bodies of evidence. 

Since the problem of extreme sensitivity of Dempster’s rule was exposed 
by Zadeh (1979), Dubois and Prade (1985) proposed as an alternative a 
possibilistic rule of combination based on the theory of possibility which is 
related to the fuzzy set theory. Zadeh and Dubois et al. insist that the extreme 
sensitivity of Dempster’s rule in combining additive probabilities is the effect of 
the normalization in its denominator. They think that the normalization 
suppresses an important aspect of information obtained from the conflicting 
bodies of evidence, so that Dempster’s rule may yield highly counterintuitive 
results. According to the above example, however, the cause of the extreme 
sensitivity lies in incorrect representation of belief, not in Dempster’s rule itself. 
Recall that the frame of discernment consists of mutually exclusive and 
exhaustive hypotheses. If two sources were completely reliable, there might be 
little conflict between the bodies of evidence provided by them. Conversely, if 
there were strong conflict between bodies of evidence, the sources providing 
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the evidence could not be completely reliable, either or both of them should 
have non-zero measure of uncertainty. In conclusion, interval-valued 
probabilities are more adequate than conventional additive probabilities to 
represent belief. 


4.6. Summary 

In this chapter, after defining desirable properties for combination rules to 
be formulated as functional equations, the inferencing mechanisms of 
subjective Bayesian updating rules and Dempster’s rule were examined in 
terms of their properties. The comparison revealed that Dempster’s rule is a 
more general scheme to combine bodies of evidence providing the belief 
functions represented by interval-valued probabilities. It has been observed 
that in combining conflicting bodies of evidence, Dempster’s rule produces 
more robust and consistent combined belief measures when the belief 
measures are interval-valued. 

In this chapter, the contributions of this research are the formal definitions 
of the desirable properties of combination rules, interpretations of the 
inferencing mechanisms of the existing combination rules, and the analysis of 
the robustness of Dempster’s rule in the aspect of its differential behavior 
according to slight changes of initial belief measures. 
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CHAPTER 5 

DECISION MAKING BASED ON INTERVAL-VALUED 

PROBABILITIES 


5.1. Introduction 

Making a decision is the last step before evaluating the performance of a 
classifier in any pattern recognition problem. Over the past three decades, 
statistical decision theory has played an important role in the decision process 
of statistical pattern recognition techniques. 

In conventional statistical methods for pattern recognition where 
statistical information is represented by point-valued probabilities, there is only 
one decision rule to use in deciding whether or not a given pattern belongs to 
some prespecified class of patterns. The decision rule gives an estimate of the 
unknown, true class of the pattern, and the estimate varies depending on the 
criterion underlying the decision rule. For example, the “Bayes decision rule" is 
devised in such a way that the “average risk” is minimized. The Maximum 
Posterior classification, which is the most common classification method in 
remote sensing, uses a “Bayes decision rule with 0-1 loss function.” 

In the previous chapters, representation and combination of statistical 
evidence in the form of interval-valued probabilities were studied. Although 
interval- valued probabilities provide an innovative means for the representation 
of evidential information, they make the decision process rather complicated 
and entail more intelligent strategies in making decisions. Based on the 
evidential interval bounded by degrees of support and plausibility, one has 
more than one choice for a decision rule. One can make a decision either 
based on any one of support or plausibility, or based on their average. 

This chapter presents an account of basic elements in the decision 
theory for pattern recognition based on interval-valued probabilities. It will be 
noticed that under a certain condition those basic elements are a generalization 
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of the elements of Bayesian decision theory. This chapter also formalizes the 

decision-making process and develops decision rules for the evidential 
intervals. 


5.2. Interval-Valued Expectations 

Let [L, v] be an interval-valued probability defined in the Boolean 
algebra <B of subsets of Q, and V denote a real-valued function defined over Q 


- {to}. Dempster(1968) defines an “upper distribution function” 
distribution function” respectively as: 

and a “lower 

F» = v({(o | V(co) < V }) 


for — w < v < oo 

(5.2.1) 

F*(v) = L({ to | V(co) < v}) 

The pair [F* , F*] defined above has the following properties: 


(i) Both are nondecreasing, i.e., 


if v 1 < v 2 then F*^) < F*(v 2 ) and F*^) < F*(v 2 ) 
(ii) Both are continuous from the right, i.e., 

(5.2.2) 

For e > 0. I'm F*(v+e) = F*(v) and I'm F .(v+e) = F.(v) 

c— £—>0 7 

(5.2.3) 

(iii) F*(+oo) = F*(+oo) = i , F*(-oo) = F*(-oo) = o 

(5.2.4) 

(iv) If F*(v 0 ) = 0 (F*(v 0 ) = 0) 


then F*(v) = 0 (F*(v) = 0) for every v < v 0 

(5.2.5) 

(v) F*(v) > F*(v) for -oo < v < oo 

(5.2.6) 


The proof of the above properties is trivial, (i) - (iv) are the same as the 
properties of the ordinary distribution function. Refer to Papoulis(1984) for their 
proofs. And (v) is a direct consequence of eq. (2.4.3). 
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Further, Dempster defines his “upper expectation’ and 
expectation” as: 

co 

E*(V) = JvdF*(v) 

— oo 
oo 

E*(V) = J v dF*(v) 


“lower 


( 5 . 2 . 7 ) 


Note that the upper and lower stars are interchanged. It is necessary in order to 
keep the relation E*(V) > E*(V). For any real-valued functions V and W 

defined over Q, E* and E* have the following properties: 


(i) E*(V) > E*(V) 


( 5 . 2 . 8 ) 


(ii) If V(co) > W(co) for all ox=Q 

E*(V) > E*(W) and E*(V) > E*(W) 

Dempster’s upper and lower expectations generalize 
upper and lower probabilities. Speaking in detail, let Z A 

function of AcQ, i.e., 


( 5 . 2 . 9 ) 

the concepts of 
be the indicator 


Z A (o>) = | 1 ’°’“ SA < 5 - 210) 

A lo otherwise 

Then, by the above definitions and the conjugate relationship of wand l 

400 1 

E*(Z a ) = j z dF*(z) = J z L({(o|Z A (o))<z}) dz 
« 0 

= L(Q) - A A) = 1 - Z(A) = 11 (A) 

400 1 

E + (Z a )= j z dF*(z) = Jz W({w|Z A (0))<z}) dz 

<» 0 

= ll(Q) - 11(A) = 1 - 11(A) = L( A) 


( 5 . 2 . 11 ) 
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For pattern recognition problems, it seems natural to define upper and 
lower probabilities respectively by upper and lower envelopes, i.e., the 
supremum and the infimum of a certain class of probability measures as 
expressed in Definition 2.2. As mentioned earlier in section 2.4, the envelopes 
are a subclass of the axiomatically defined interval-valued probabilities. Also, if 
l is 2-monotone and zi is 2-alternating, then they are envelopes. 

Suppose that l and zi are given as 


L(A) = inf { ti(A) : k e T] 
ZJ(A) = sup { 71(A) : 7 t € !P} 


for Ae $ 


(5.2.12) 


where <P is the class of the probability measures dominated by zi. Then, the 
following lemma is proved by Wolfenson and Fine (1982). 


Lemma 5.1. 


For an interval-valued probability [l, zi), the upper and lower 


expectations can be given as: 


E*(V) = SLJ P E rt (V) 

7te!P 

(5.2.13) 

E*(V) = mf E rt (V) 

7t£ T 

Iff x IS 2-monotone and zi is 2-alternating, where V is a real-valued function 

over Q and E rc (V) is the expected value of V with respect to the probability 
measure 7t. 


The upper and lower expectations in (5.2.13) have the following 
properties as well as the properties in (5.2.8) and (5.2.9): 

(iii) E*(V)<E s (V)^E*(V) for any ns T, (5-2 . 14) 

(iv) For any nonnegative function W over Q, 


E*(a+bW) = 


fa + bE*(W ) 
[a + bE * (W ) 


if b > 0 
if b < 0 


(5.2.15) 
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E*(a+bW) 


fa + bE *(W ) if b > 0 
la + bE * (W ) if b < 0 


(5.2.16) 


where a and b are constants. 

This section introduces two different definitions of the interval-valued 
expectations; one which applies to any system of interval-valued probabilities, 
and the other which applies only to a system of 2-monotone and 2-alternating 
interval-valued probabilities. In general, the two definitions do not coincide in a 
class of all sets of probability measures over <B. Dempster (1968) already 
argued that for a general convex set *P , it can happen that 

fvdF*(v)< inf E rt (V) (5-2.17) 

L TC€!P 


The second definition is not only unapt to a general system of interval-valued 
probabilities but also computationally intractable. For the expectations in eq. 
(5.2.13) to be useful, an explicit expression of k in fPmust be available. 

5.3. Decision Rules based on Interval-Valued Probability 

Consider a basic classification problem where an arbitrary pattern xel 
from an unknown class is assigned to one of n classes in £2. Let Mo)j|coj) be a 
measure of the “loss” incurred when the decision &>j is made and the true 
pattern class is in fact coj, where i, j = 1, .... n. Also, let <5>(x) denote a decision 
rule that tells which class to choose for every pattern x. Using the upper and 
lower expectations in eq. (5.2.7), the “upper expected loss” and the “lower 
expected loss” of making a decision <k(x)=cOj are obtained as: 

n 

/*(x) = X M COi | 0 )j ) 
i=i 

n 

U j(x) = £ M COj I 0)j ) £*(10 j) 

i=i 


(5.3.1) 



64 


where <u x and £* are respectively the upper and the lower probabilities for x 
being actually from coj. 

Based on the interval-valued expected losses, the most desirable 
decision rule is the one which has the upper expected loss less than the lower 
expected losses of the others, i.e., 

&(x) = ®i if (*(x) < /..(X) for j=1 n (5 .3.2) 

This rule is called an “absolute rule.” 

The Bayes like rule is the one which minimizes both the upper and the 
lower expected losses, i.e., 

<*>(*> = <0; if <*<x)s t]W and t. j( x) £ 4j(x) forj=1 n (5.3.3) 

In particular, when X is the “0—1 loss function”, i.e., 


MA(x)la)j) = 


0 i f to(x) = o)j 

1 i f co(x) * 0 )j 

the interval-valued expected loss in eq. (5.3.1) is simplified as: 


(5.3.4) 


n 


/*(X) = £ W x (cOj) - ^ x (tOj) 

j=1 


n 


(5.3.5) 


&j( x ) = X ^(“j) ~ Ac(C0j) 
j=i 


Since the first terms in the right-hand sides are constant for U1 „ 

minimizing both f*(x) and £ |( x) corresponds to maximizing VM ) and 

Hence, the decision rule in eq. (5.3.2) becomes 


&(x) coj if w x (coj) > « x (tOj) and ^(coj) > ^(coj) for j=1 n (5.3.6) 

A problem with the above decision rules is that there does not always 
exist co which satisfies the condition in eq. (5.3.2) or (5.3.3), which can lead to 

ambiguity. In comparing a pair of the interval- valued expected losses, there are 
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three different kinds of relationships distinguished by their relative locations: 

(1) disjoint intervals; 

f*(x) > 4j(x) > tp) > 4j(x) (5-3.7) 

(2) overlapped intervals; 

tp) * p) * 4j(x) * &j( x ) 

(3) nested intervals; 

/*(x) > /J(x) > 4 j(x) > 4j(x) 

The following example illustrates these intervals. 

Example 5.1. Let Q = {co 1; co 2 , co 3 , co 4 }. [z^, ?z x ] denotes the interval-valued 
probability function of subsets of Q. given a pattern x. Suppose that the basic 
probability assignment m x of [£*, tz x ] is given as 

m x(( c ° 1 }) = 0 2 ^({ 0 ) 2 }) = 0.3 /%({«!, 0 ) 3 }) = 0.34 m x ({o) 2 l co 4 }) = 0.16 

and m x ( A) = 0 for any other subsets A of £2. Then, the interval-valued 
probabilities of the singletons are obtained as 




{0)2} 



L x 

0.2 

0.3 

0 

0 


0.54 

0.46 

0.34 

0.16 


For the 0-1 loss function, the expected loss interval of co 2 is nested in w^s, co 1 is 
overlapped with co 3 , and o ) 4 is disjoint with respect to co^ and to 2 . The Bayes-like 
rule does not produce a decision. 

The above example shows a simple case where the Bayes-like decision 
rule leads to ambiguity. In such an ambiguous situation, one may withhold the 
decision and wait for a new piece of information. Otherwise, the ambiguity may 


(5.3.8) 

(5.3.9) 
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be resolved by resorting to the following rule, so-called “minimum average 
expected loss rule”: 


&(X) = COj 


4(x) + t *j(x ) /*(x) + Ux) 

if 2 * 2 for H n 


(5.3.10) 


For the 0-1 loss function, this rule is called “maximum average probability rule”, 
and the decision is made according to 


<S>(X) as COj 


^x(tOj) + A((C0i) ^ x (0)j) + ^(tOj) 

2 ~ 2 


for j=1 n 


(5.3.11) 


As an alternative to the absolute rule and the Bayes-like rule, there are 
two other rules by which a decision is made according to individual measures of 
the interval, for instance, either the upper expected loss or the lower expected 
loss: 


(1) minimum upper expected loss rule: 

&(X) = COj if /*(x)</*(x) for j=1 n (5.3.12) 

For the 0-1 loss function, this rule may be renamed “maximum upper 
probability rule” or “maximum plausibility rule”, and the decision is made 
according to 

&(x) = coj if zi x ( coj) > f/ x (tDj) for j=1 n (5.3.13) 

(2) minimum lower expected loss rule: 

&(x) = ( 0 j if 4j(x) < £,.(x) for j=1 n (5.3.14) 

For the 0-1 loss function, this rule is called “maximum lower probability 
rule” or "maximum support rule”, and the decision is made according to 

&(x) = coj if ^(coj) > ^(Wj) for j=1 n (5.3.15) 

Although the above two rules always produce decisions and there is no 
ambiguous situation in making a decision according to the rules, they do not 
utilize all of the information represented by the IV probabilities. The 
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performance of these rules will be compared with the minimum average 
expected loss rule in the next chapter by applying them to problems of ground- 
cover classification based on remotely sensed and geographic data. 


5.4. Summary 


The purpose of this chapter was to formalize the decision-making 
process for any system of interval-valued probabilities. In particular, the 
process was considered from the viewpoint of statistical decision theory. 

First, two different definitions of interval-valued expectations were 
studied, and their statistical properties were compared with those of the ordinary 
expected value. Then the absolute rule and the Bayes-like rule for evidential 
intervals were developed based on the general interval-valued expectation. 
Since these rules are not always satisfied, they may require an extra step to 
resolve ambiguous situations. In order to resolve the ambiguous situations, this 
chapter proposed the minimum average expected loss rule. As alternatives to 
the absolute rule and the Bayes-like rule, the minimum upper expected loss rule 
and the minimum lower expected loss rule were proposed. 

While the absolute rule and the Bayes-like rule make decisions based on 
both the upper and the lower expected losses, the minimum upper expected 
loss rule and the minimum lower expected loss rule make decisions based on 
either the upper or the lower expected loss. In the evidential reasoning, the 
lower probability and the upper probability represent respectively the minimal 
and the maximal degree of belief. Hence, the minimum lower expected loss 
rule may be chosen when the decision process needs to be conservative; and 
the minimum upper expected loss rule may be chosen when the decision maker 
is confident about the information represented by IV probabilities. 

In this chapter, the contribution of the research is in the formal 
development of the decision-making process and the decision rules for interval- 
valued probabilities. 
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CHAPTER 6 

EXPERIMENTAL RESULTS 


6.1. Introduction 

In this chapter, the methods presented in this report are applied to 
problems of ground-cover classification for multispectral data combined with 
other geographic data. The multisource data (MSD) classification based on the 
evidential reasoning (ER) method is implemented as the following procedure: 


In the training stage, 

1. Compute the global correlation coefficient matrix of multisource data and 
reform the data set if necessary. Throughout the experiments, the global 
correlation information will be used to confirm the “distinctness" of bodies of 
evidence as required by Dempster’s rule. 

2. For each class, select training pixels and compute statistics for each source. 

3. Compute the separability measures of each source and the average 
measures of conflict between pairs of the sources as defined in Section 3.4. 
Rank the data sources and assign a degree of reliability to each source. 

The steps in the test stage classifying “unknown” pixels will be described by 
considering an actual problem of classifying a test pixel to one of the classes in 
Q = {co-|, co 2 , w 3 . w 4} based on tw0 data sources denoted by S 1 and S 2 . 

Xj : Test vector representing the test pixel obtained from S| (i=1 , 2). 

a* : Source reliability of Sj, 0 < a* < 1 . 
p (xj) : Conditional probability density of Xj given coj. 

m\ : Basic probability assignment based on Sj. 
m : Basic probability assignment based on S 1 and S 2 . 
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Sp : Support function based on Si and S 2 . 

•PC: Plausibility function based on Sj and S 2 . 

Suppose that /^.(Xj) for i= 1, 2 and j =1 4 are obtained such that 

W x i> - ^02^1 ) - > f(o 4 ( X i) 

P(H2^ X 2) - P(d 3 ( x 2) - P m (* 2 ) - P® 4 ( x 2 ) 

(A) Using the consonant belief functions: 

The focal elements based on S, are {o>i}, {co 1? co 2 }, {o) 1t (o 2 , co 3 }, and Q. 

The focal elements based on S 2 are {a^}, {a> 2 , a) 3 }, {w 2 , ( 05 , a)^, and Q. 

1. Compute W|(A) and rn^B) by using eq. (3.3.9), where A and B denote the 
focal elements of St and S 2 , respectively. 

2 . Multiply mj by q for the subsets of Q, and add cq to mj(Q). 

3. Compute m= mr©?^ by using eq. ( 4 . 4 . 2 ). 

4 For each singleton coj, compute 

= m ({o)j}) and ®f({a)j}) = £ m (A) 

An{a)j}*0 

5 Classify the test pixel to a class according to one of the decision rules for 
IV probabilities in Chapter 5. 

(B) Using the partially consonant belief functions: 

Based on the relation in the hierarchical structure of the classes, suppose 
that Q has a partition {{cot, c^}, {cd 3 , co 4 }}. 

The focal elements based on St are (cot), {cot, to^, {CO 3 }, and { 0 ) 5 , co 4 }. 

The focal elements based on S 2 are {o^}, {£o 2 , co,}, { 0 ) 3 }, and { 0 ) 3 , co 4 }. 

1. Compute ott(A) and m 2 (B) by using eq. (3.3.10) and (3.3.11), where A 
and B denote the focal elements of St and S 2 , respectively. 

2 . Multiply mj by cq for the subsets of Q, and add ctj to ^Q). 

3. Compute m= nu [ Bm 2 by using eq. ( 4 . 4 . 2 ). 
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4 For each singleton coj, compute 

5p({coi}) = m ({Wj}) and ¥%{(%}) = X w(A) 

An{o)j}^0 

5 Classify the test pixel to a class according to one of the decision rules for 

IV probabilities in Chapter 5. 

Figure 6.1 is the block diagram of for classifying a pixel in the MSD classifi- 
cation based on the ER method. 

The experiments have been performed over three different image data 
sets. Table 6.1 shows the names and types of data sources of the multisource 
data sets. More detailed descriptions will be given in the following sections. 
Each data set also has a geometrically registered, digitized ground truth map as 
a reference based on which the accuracies of all subsequent classifications will 

be evaluated. 

The next section presents the experimental results of the proposed 
method applied to the Anderson River data set. The intention of the experiment 
is to assess the ability of the method in capturing and utilizing the information 
obtained from topographic data sources as well as multispectral data sources 
In Section 6.3, the method is applied to the Indiana agricultural area data set 
which contains only a single multispectral data source. The purpose is to show 
the possibility that the MSD classification based on the evidential reasoning 
method can overcome the effects of the Hughes phenomenon [Hughes (1968)1 
which results in lowered classification accuracy for high-dimensional data with 
limited number of training samples. The goal is to show that improved 
classification can be obtained by decomposing a high-dimensional data source 
into smaller and more manageable pieces and treating them as multiple data 
sources The possibility becomes more concrete in Section 6.4 where the 
method is applied to a simulated High Resolution Imaging Spectrometer 
(H1RIS) data set which is composed of 201 bands. 

In every application, the classification accuracies of the MSD classifica- 
tion are compared with those of Maximum Likelihood (ML) classifications based 
on the stacked vector approach. Since the stacked vector approach treats 




Figure 6.1 Block Diagram of Evidential Reasoning Method for Multisource Data Classification. 











Combined IVP 
based on Total Source 
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compound vectors as data from a single source, the comparison of the MSD 
and the ML classifications will assess the advantages and the disadvantages of 
the multisource data analysis approach compared to a standard single source 
analysis approach used in remote sensing. 


Table 6.1 Multisource Data Sets. 


Name 

Types of Data Sources 

Anderson River Data 

Airborne MSS, SAR, Elevation, Slope, Aspect 

Indiana Agricultural Area 
Data 

Airborne MSS 

Finney County Data 

HIRIS 


6.2. Classification of Multispectral Data combined with Topographic 
Data 

The Anderson River data set* used in the first experiment consists of 3 
multispectral data sources (optical and radar) and 3 topographic data sources. 
Table 6.2 describes the types of data sources for the first experiment. The 
image of this data set consists of 256 lines by 256 columns and covers a 
forestry site around the Anderson River area in British Columbia, Canada. 
Source 1 is 11 -band Airborne Multispectral Scanner data (A/B MSS). Sources 
2 and 3 are Synthetic Aperture Radar (SAR) imagery in Shallow mode and 
Steep mode, respectively. The column “spectral band” for sources 2 and 3 
describes the band and the transmit and receive type of SAR images. For 
example, XHV means that the image is obtained in X-band (X=3 cm) of the 
microwave region by horizontal polarization transmit and vertical polarization 
receive. Sources 4-6 provide digital terrain data obtained as follows: 


* The SAR/MSS Anderson River data set was acquired, processed and loaned 
to Purdue University by the Canadian Center for Remote Sensing, Department 
of Energy, Mines and Resources, of the Government of Canada. 










Table 6.2 Description of Anderson River Data Set 
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a) digital elevation model (DEM) 

gray level = {elevation (in meters) - 61.996} 4- 7.2266 

b) digital aspect model (DAM) 

gray level = aspect (in degrees) 4- 2 

c) digital slope model (DSM) 

gray level = slope (in degrees) 

Table 6.3 lists the information classes in the area, and Figure 6.2 shows 
the ground truth map. More than three quarters of the area is covered by mixed 
forestry. The information classes were defined based on a forestry map, and it 
has been observed that some of the classes are very difficult to classify 
accurately. In this experiment, 6 of the more separable classes were selected, 
and these are listed in Table 6.4. Figure 6.3 displays the test areas of the 6 
classes over the enhanced A/B MSS image. Some of the field labels are not 
readable. However, they can be confirmed by the ground truth map in Figure 
6.2. Figures 6.4 and 6.5 are Synthetic Aperture Radar imagery respectively in 

Shallow and Steep mode, and Figures 6.6 through 6.8 are the digital terrain 
imagery of the data set. 

Table 6.5 is the global statistical correlation coefficient matrix among the 
data sources. Correlation coefficients between pairs of variables from different 
sources are generally quite low compared to those from the same source. 
When the data can be assumed to be normally distributed, their uncorrelated- 
ness implies statistical independence. In the experiments, we treat the data 
sources (including the topographic data sources) which have relatively low 
correlation as “globally independent” in order to assume that they reasonably 

closely satisfy the “distinctness” of bodies of evidence required by Dempster’s 
rule. 

In the experiment with the Anderson River data set, 100 pixels per class 
were used for training data, which is between 4% and 8% of the total pixels of 
the classes in the test fields. The training samples are uniformly distributed over 
the test fields so that they may be considered as good representatives of the 



Table 6.3 Information Classes 


Class 

Index 

Cover 

Types 

1 

Douglas Fir (DF) 1 

2 

DF 2 

3 

DF 3 

4 

DF 4 

5 

Bare Soil, Slides 

6 

DF+Other Species 1 

7 

DF+Other Species 2 

8 

DF+Other Species 3 

9 

DF+Lodgepole Pine 1 

10 

DF+Lodgepole Pine 2 

11 

DF+Cedar 1 

12 

DF+Cedar 2 

13 

Lodgepole Pine 

14 

Hemlock+Cedar 

15 

DF+Hemlock 

16 

Hemlock+DF 1 

17 

Hemlock+DF 2 

18 

Rock, Talus 

19 

Forest Clearings 

Total 
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in Anderson River Data Set. 


Tree 

Sizes 

No. of 
Pixels 

% of 
Total 

> 40m 

1946 

2.97 

31 - 40m 

13158 

20.08 

21 - 30m 

6576 

10.03 

10 - 20m 

1045 

1.59 


110 

0.17 

> 40m 

1973 

3.01 

31 - 40m 

5761 

8.79 

21 - 30m 

1309 

2.00 

31 - 40m 

510 

0.78 

21 30m 

5636 

8.60 

> 40m 

2483 

3.79 

31 - 40m 

2895 

4.42 

10 - 20m 

113 

0.17 

31 - 40m 

3173 

4.84 

31 - 40m 

2961 

4.52 

31 - 40m 

825 

1.26 

21 - 30m 

456 

0.70 


1982 

3.02 


12624 

19.26 


65536 

100.0 










Figure 


6.13 Ground Truth Map of Indiana Agricultural Area Data Set. 





Table 6.4 Information Classes for Test of Anderson River Data Set. 


Class 

Index 


Cover Tree 

Types Sizes 


Douglas Fir 2 (df2) 31 - 40m 

Douglas Fir 3 (df3) 21 - 30 m 

DF+Other Species 2 (df+os2) 31 - 40m 
DF+Lodgepole Pine 2 (df+lp2) 21 - 30m 
Hemlock+Cedar (he) 31 - 40m 

Forest Clearinas (fcl 


Total 


No. of 
Pixels 


2246 

1501 

1352 

1589 

1587 

2064 


10339 


Figure 6.3 Test Areas over Histogram Equalized A/B MSS 
(Channel 10) Image of Anderson River Data Set. 








Figures 6.4 Histogram Equalized SAR-Shallow mode (LHH) 
Image of Anderson River Data Set 



Figures 6.5 Histogram Equalized SAR-Steep mode (LHH) 
Image of Anderson River Data Set 



is 

Quality 




Figure 6.6 Digital Elevation Image of Anderson River Data Set. 






1 


2 


3 


1.000 0.815 0.753 

1 .000 0.956 

1.000 


0.709 

0.933 

0.975 

1.000 


coefficient Matrix of Anderson River Data Set. 


A/B MSS 


5 

6 

7 

8 

9 

10 

11 

0.670 

0.633 

0.626 

0.573 

0.459 

0.520 

0.593 

0.905 

0.882 

0.875 

0.686 

0.505 

0.563 

0.747 

0.961 

0.955 

0.951 

0.677 

0.465 

0.516 

0.792 

0.996 

0.984 

0.981 

0.744 

0.530 

0.570 

0.765 

1.000 

0.992 

0.990 

0.742 

0.526 

0.562 

0.761 


1.000 

0.998 

0.672 

0.442 

0.477 

0.760 



1.000 

0.684 

0.454 

0.490 

0.773 




1.000 

0.926 

0.956 

0.617 





1.000 

0.959 

0.464 






1.000 

0.532 







1.000 









SAR SHALLOW 


LHH LHV XHH 


1 .000 0.323 0.447 

1.000 0.312 
1.000 


SAR STEEP 


XHV 


0.316 

0.426 

0.326 

1.000 


LHH 


0.086 

0.161 

0.007 

0.161 


1.000 


LHV 


0.097 

0.164 

0.085 

0.166 


0.348 

1.000 


XHH 


0.147 

0.187 

0.105 

0.201 


0.472 

0.338 

1.000 



XHV 


0.143 

0.208 

0.104 

0.216 


0.378 

0.558 

0.391 

1.000 


TOPOGRAPHIC 


Aspect Eleva Slope 


14 -.027 -.006 

06 -.033 0.027 

33 -.177 0.022 

82 -.062 0.046 


0.094 0.101 0.131 

0.150 -.054 0.064 

0.139 0.131 0.124 

0.175 0.027 0.072 


1.000 0.127 

-.117 

1.000 

-.023 


1.000 


TOPO 








SAR SHALLOW 


LHH 

LHV 

XHH 

XHV 

0.074 

0.094 

0.102 

0.088 

0.082 

0.105 

0.107 

0.097 

0.075 

0.103 

0.088 

0.087 

0.074 

0.102 

0.082 

0.087 

0.069 

0.099 

0.070 

0.082 

0.060 

0.089 

0.052 

0.070 

0.062 

0.093 

0.051 

0.073 

0.103 

0.147 

0.127 

0.139 

0.099 

0.145 

0.135 

0.141 

0.108 

0.158 

0.136 

0.154 

0.092 

0.131 

0.089 

0.110 


10 

11 


SAR STEEP 

LHH LHV XHH XHV 


TOPOGRAPHIC 

Aspect Eleva Slope 


-.123 

-.117 

-.099 

-.081 

-.072 

-.065 

-.065 

-.074 

-.066 

-.076 

-.084 


0.008 

0.041 

0.061 

0.076 

0.081 

0.078 

0.079 

0.096 

0.086 

0.083 

0.047 


-.193 

-.190 

-.169 

-.140 

-.128 

-.122 

-.121 

-.101 

-.079 

-.100 

-.152 


-.035 

-.005 

0.017 

0.038 

0.045 

0.044 

0.047 

0.074 

0.069 

0.068 

0.014 


-.076 

-.063 

-.041 

-.031 

-.024 

-.013 

-.009 

-.034 

-.036 

-.042 

-.072 


-.589 

-.546 

-.424 

-.333 

-.271 

-.217 

-.205 

-.327 

-.320 

-.365 

-.341 


-.039 

-.055 

-.061 

-.071 

-.074 

-.066 

-.067 

-.107 

-.100 

-.106 

-.066 
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Figure 6.10 Classwise Histogram of Training Samples of a Subset 
of the Classes in theAnderson River Elevation Data. 
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64 128 192 


Gray Value 

Figure 6.1 1 Classwise Histogram of Training Samples of a Subset 
of the Classes in the Anderson River Aspect Data. 



o 16 32 48 64 

Gray Value 


Figure 6.12 Classwise Histogram of Training Samples of a Subset 
of the Classes in the Anderson River Slope Data. 
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total samples. As we can observe in Figures 6.9 through 6.12, some of the 
classes defined in Table 6.4 cannot be assumed to be normally distributed in 
the topographic data. Thus, it was decided to adopt a nonparametric approach 
such as the “Nearest Neighbor” (NN) method [Fukunaga (1972)] in computing 
probability measures while the optical and radar data sources were assumed to 
have Gaussian probability density functions. Table 6.6 compares the overall 
classification accuracies obtained by the ML method with the Gaussian 
assumption and k-NN method for the individual topographic data sources. The 
results show that the topographic data are information-bearing in the sense of 
classification and suggest that the topographic data sources, especially 
Elevation, should be included in the classification. Although the k-NN method 
results in various classification accuracies for different k’s, it always gives higher 
accuracies than the ML method especially for the training data. In the MSD 
classification, interval-valued belief functions for the bodies of statistical 
evidence provided by these topographic data sources were constructed from 
the likelihood functions obtained by the 2-NN method. 


Table 6.6 Overall Classification Accuracy (%) obtained by ML 
Method and k-NN Method for Topographic Data Sources. 


Samples 

Method 

Elevation 

Aspect 

Slope 

Training 

ML 

45.83 

30.33 

29.17 

1 -NN 

67.00 

50.00 

48.67 

2-NN 

66.67 

47.63 

46.50 

5-NN 

65.50 

44.50 

45.83 

Testing 

ML 

42.64 

32.06 

30.72 

1 -NN 

45.33 

35.63 

34.51 

2-NN 

46.79 


Wmm 

5-NN 

45.03 

35.29 

■n 
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Tabl© 6.7 Average Measures of Conflict between Pairs of Sources using 
Partially Consonant Belief Function for Training Samples. 



Table 6.8 Average Measures of Conflict between Pairs of Sources using 
Partially Consonant Belief Function for All Samples. 
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In order to rank the sources by their reliability, the average J-M distance 
and the average Transformed Divergence of each source were calculated and 
compared with the overall classification accuracy obtained by the ML method 
over the training samples (Table 3.2). We also computed the average 
measures of conflict between pairs of the sources using the consonant belief 
function (Tables 3.3, 3.4) and the partially consonant belief function (Tables 6.7, 
6.8). Assuming that A/B MSS is the most reliable in the sense of classification, 
all the measures agree that Elevation and SAR-Shallow are the 2nd and the 
3rd, respectively. They do not agree at all for the remaining sources. In the 
multisource data classification with this data set, the remaining sources have 
been considered as equally reliable. 

For the purpose of comparison, the ML classification based on the 
stacked vector approach was carried out for various sets of the data sources, 
adding one source at a time to the A/B MSS data in the order Elevation, SAR- 
Shallow, SAR-Steep, Aspect, and Slope. Then the MSD classification was 
performed using different combinations of interval-valued belief functions and 
decision rules. Tables 6.9 and 6.10 compare the results for the training 
samples and the test samples, respectively. Even though the compounded data 
in the ML classification were treated as having Gaussian distributions, the ML 
and the MSD methods produced similar results for the training samples. This is 
not surprising because the ML method uses conventional additive probabilities 
assuming that the knowledge concerning the actual unknown probabilities is 
complete, which is reasonable as far as the training samples are concerned. 

In the MSD classification using the partially consonant belief function 
(PCBF), the information classes were partitioned as {df2, df3, df+lp2} and 
{df+os2, he, fc}. This partition was made on the basis of the classwise 
separability measures of the individual sources so that the average separability 
between the partitions is maximized. 

Comparing the performance of the two belief functions, the consonant 
belief function (CBF) was better for the training samples while PCBF was better 
for the test samples. It is not known at this point whether CBF or PCBF is better. 
As far as the decision rules are concerned, the maximum plausibility (MP) rule 
was superior to the other rules, the maximum support (MS) rule and the 
maximum average probability (MA) rule. It is also not known in general which 
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Table 6.9 Results of ML Classification and MSD Classification 
over Training Samples of Anderson River Data. 


ML 


CBF 


Decision 

Rule 


PCBF 


MP 

MS 


MA 


MP 

MS 


MA 


Sources 


82.50 


1,4 

88.67 

89.83 
88.67 

88.50 
88.67 

86.83 

87.50 


1,2,4 

91.67 

92.00 

91.17 

91.00 
91.50 

89.67 

90.17 


1 -4 


92.00 


92.50 

91.33 

91.67 

92.17 

91.33 

91.83 


1 -5 


92.83 


93.17 

92.33 

91.67 

92.67 
91.00 

91.67 


1 -6 


93.50 


94.33 

93.67 

93.50 


93.83 


92.17 

92.83 


Table 6.10 Results of ML Classification and MSD Classification 
over Test Samples of Anderson River Data. 



Decision 

Sources 



Rule 

1 

1,4 

1,2,4 

1 -4 

1 -5 

1 -6 

ML 


74.16 

77.77 

79.13 

78.93 

79.80 

81.01 


MP 

— 

80.60 

82.39 

82.69 

83.02 

84.54 

CBF 

MS 

— 

78.45 

81.42 

81.67 

82.24 

83.65 


MA 

- 

78.21 

80.95 

82.05 

81.88 

83.16 


MP 

— 

80.86 

82.76 

83.15 

84.27 

85.95 

PCBF 

MS 

— 

78.94 

81.31 

81.64 

83.05 

84.16 


MA 

- 

78.49 

81.67 

82.25 

83.78 

84.44 
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rule is the best. Further research is needed to determine whether guidelines 
can be devised for selection of the belief function and decision rule. 

The MSD classification for all the sources was iteratively performed with 
various degrees of source reliability. In this case, the MP rule was used as a 
decision rule because it produced the best results in the classification of 
multiple data sources with equal reliabilities. Tables 6.11 and 6.12 show the 
overall classification results over the training samples and the test samples, 
respectively. The results show not only that the classification accuracy may 
increase as the reliabilities of the additional data sources are varied but also 
that it can be degraded if the additional data sources are discounted too much. 
It is also observed that the variations in the accuracy by PCBF are relatively 
smaller than those by CBF. The reason is because the width of a partially 
consonant interval-valued probability is usually less than the width of a 


Table 6.1 1 Results of MSD Classification over Training Samples of Anderson 
River Data with Various Degrees of Source Reliability. 



Source Reliability 



1 

2 

3 

4 

5 

6 

Overall (%) 


1.0 

1.0 

1.0 

1.0 

1.0 

1.0 

94.33 


1.0 

0.8 

0.8 

0.8 

0.8 

0.8 

95.17 

CBF 

1.0 

0.8 

0.6 

0.8 

0.6 

0.6 

95.83 


1.0 

0.7 

0.5 

0.7 

0.5 

0.5 

95.00 


1.0 

0.6 

0.4 

0.8 

0.4 

0.4 

93.83 


1.0 

1 1.0 

1.0 

1.0 

! 1.0 

1.0 

93.83 


1.0 

0.8 

0.8 

0.8 

0.8 

0.8 

95.00 

PCBF 

1.0 

0.8 

0.6 

0.8 

0.6 

0.6 

95.17 


1.0 

0.7 

0.5 

0.7 

0.5 

0.5 

93.67 


1.0 

0.6 

0.4 

0.8 

0.4 

0.4 

91.67 
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consonant interval-valued probability, which makes PCBF less sensitive to the 
changes in source reliability. 

Overall, the MSD classification using evidential reasoning was able to 
produce higher accuracy than the ML classification. The increase in the 
classification accuracy obtained by the MSD classification should be primarily 
attributed to the ER method’s capability of adequately representing bodies of 
statistical evidence by interval-valued probabilities. Furthermore, the MSD 
classification was capable of incorporating various degrees of source reliability 
into the process by treating the multiple sources separately. It was also 
possible in this particular experiment to utilize non-parametric information using 
the k-NN method together with parametric information. This is another 
advantage of the MSD classification by treating the multiple sources separately. 


Table 6.12 Results of MSD Classification over Test Samples of Anderson 
River Data with Various Degrees of Source Reliability. 



Source Reliability 

Overall (%) 

1 

2 

3 

4 

5 

6 

CBF 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 

84.54 

1.0 

0.8 

0.8 

0.8 

0.8 

0.8 

85.40 

1.0 

0.8 

0.6 

0.8 

0.6 

0.6 

85.69 

1.0 

0.7 

0.5 

0.7 

0.5 

0.5 

84.25 

1.0 

0.6 

0.4 

0.8 

0.4 

0.4 

83.04 | 

PCBF 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 

85.95 

1.0 

0.8 

0.8 

0.8 

0.8 

0.8 

86.09 

1.0 

0.8 

0.6 

0.8 

0.6 

0.6 

86.74 

1.0 

0.7 

0.5 

0.7 

0.5 

0.5 

85.27 

1.0 

0.6 

0.4 

0.8 

0.4 

0.4 

83.21 
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6.3. Classification of Single-Source Multispectral Data 

In the previous section, the proposed method was applied to the 
classification of multisource data obtained by various sensors. The data set 
used in this section is 12-band Airborne MSS data whose flightline ID is “CRN 
BLT LO FL21” taken on August 21, 1971. Table 6.13 describes the spectral 
regions and bands of the 12 input channels comprising the MSS data. The size 
of the image is 220 lines by 140 columns, and the image covers an agricultural 
area in Indiana. Figure 6.13 is the ground truth map of this area, which is 
digitized and geometrically registered with the MSS data imagery. 

Although the registration has been made very carefully, the ground truth 
map contains geometric registration errors. The error is more noticeable along 
the boundaries between different ground types. If the whole area were used for 
test, incorrect classifications evaluated on the basis of the ground truth map 
would result not only from bad performance of a classifier but also from the 
geometric registration error. In order to avoid this confusion, test areas were 
chosen. Figure 6.14 shows the test areas on the MSS image (Channels 1, 4, 
9). There were 9 information classes for the test, and Table 6.14 lists them with 
their actual number of pixels counted from the ground truth map. 

This experiment was designed to observe how the proposed method 
overcomes the Hughes phenomenon when the number of training samples is 
so small. The strategy underlying the method is to decompose the relatively 
large body of evidence into smaller, more manageable pieces, to assess 
plausibilities based on each piece, and to combine the assessments by a 
combination rule. 

The set of multiple data sources was formed as shown in Table 6.15 by 
dividing the 12-band MSS data based on the global statistical correlation 
(Table 6.16) which coincides with the spectral regions. As expected, the 
correlation between pairs of bands from different spectral regions (except the 
thermal region) are relatively low compared to those within each spectral 
region. Even though the thermal band was relatively highly correlated with the 
visible bands, we chose to treat it as though it were a distinct source. The 
consequence of having done so is apparent in the experimental results. 



Table 6.13 Description of Airborne MSS Data of Indiana 
Agricultural Area Data Set. 


Spectral 

Region 


Visible 


Near 

Infrared 


Middle 

Infrared 


Thermal 


Input 

Channel 


Spectral 

Band(/x/7?) 


0.46 - 0.49 
0.48 - 0.51 
0.50 - 0.54 
0.52 - 0.57 
0.54 - 0.60 
0.58 - 0.65 
0.61 - 0.70 


0.72 - 0.92 
1.00 - 1.40 


1.50-1.80 
2.00 - 2.60 


9.30 -11.70 


Table 6.14 Information Classes in Indiana Agricultural Area Data Set 


Cover 

Types 


Corn 

Soybean 

Non-Farm 

Oat 

Wheat 

Sudex 

Hay 

Wood 

Pasture 


Total 


No. of Test 
Samples 


3489 

6454 

593 

398 

602 

936 

412 


13360 


% of 
Total 


26.11 

48.31 

4.44 

2.98 

4.51 

7.01 

3.08 

2.70 

0.86 


100.0 




















Figure 6.2 Ground Truth 


DF+Cedar 1 
DF+Cedar 2 

DF+Hemlock 
Hem lock + Cedar 
Lodgepole Pine 

Bare Soil, Slides 

Forest Clearing 


lap of Anderson River Data Set. 


Douglas Fir 1 
Douglas Fir 2 
Douglas Fir 3 
Douglas Fir 4 
DF+Other Species 1 
DF+Other Species 2 
DF+Other Species 3 
DF+ Lodgepole Pine 1 
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Figure 6.14 Test Areas over A/B MSS (Channel 8) Image 
of Indiana Agricultural Area Data. 
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For each class, from 15 to 30 samples uniformly distributed over the test 
fields were selected for training. First, the ML classification was performed with 
various sets of the input bands. Tables 6.17 and 6.18 are the results over the 
training samples and the test samples, respectively. The overall classification 
accuracy is the percentage ratio of the number of the correctly classified pixels 
to the total number of pixels while the average classification accuracy is the 
arithmetic mean of the classwise accuracies. 

Then the proposed method was applied to subsets of the input channels, 
treating them as multiple sources. Tables 6.19 and 6.20 show the results of this 
MSD classification over the training samples and the test samples, respectively. 
In this case, the consonant belief function and the maximum plausibility rule 
were adopted, and the “multiple sources" were assumed equally reliable. 

In the ML classification, both the overall and the average accuracies 
increased as the number of features was increased for the training samples; but 
this was not true for the test samples. In the MSD classification utilizing all input 
channels, although both accuracies were below 100% for the training samples, 
they were comparable to or higher than the accuracies produced by the ML 
method. The results exhibit two interesting features. First, the classification 
accuracy for the MSD classifications decreases as the set of bands is more 
finely subdivided. This is because more information in inter-channel statistical 
correlation is lost as the data set is more finely subdivided. Second, there is a 


Table 6.15 Divided Sources of Indiana Agricultural Area Data Set. 


Source 

Index 

Spectral 

Region 

Input 

Channels 

1 

Visible 

1 to 7 

2a 

Near Infrared 

8 9 

2b 

Middle Infrared 

10 11 

2c 

Near & Middle Infrared 

8 to 11 

i 2d 

Thermal 

12 

2 

Infrared 

8 to 12 









Matrix of Indiana Agricultural Area Data Set. 


.909 

.939 

.904 

.733 

.885 

.936 

1.000 


8 

-.312 

-.355 

-.195 

.047 

-.160 

-.353 

-.445 


.284 

-.267 

-.174 

-.003 

-.140 

-.310 

-.358 


22 

.413 

.371 

.442 

.491 

.448 

.393 

.409 


JJ 

.556 

.577 

.547 

.489 

.548 

.577 

.607 


2_2 

.726 

.767 

.691 

.542 

.692 ' 

.805 

.830 


1.000 


.858 

1.000 


.350 

.517 


.076 

.254 


-.520 

-.415 


1.000 


.861 

1.000 


.378 

.623 

1.000 


CD 

00 












able 6.17 Results of ML Classification over Training Samples for Various Sets of Input Bands. 


Percent Agreement with Ground Truth Map 


Class Index (No. of Pixels per Class) 


Input 

Bands 

1 

(30) 

2 

(30) 

3 

(15) 

4 

(15) 

5 

(15) 

1 to 12 

100.00 

100.00 

100.00 

100.00 

100.00 

1 to 7 

96.67 

96.67 

100.00 

80.00 

93.33 

8 to 12 

100.00 

96.67 

100.00 

66.67 

100.00 

8 to 11 

96.67 

76.67 

100.00 

46.67 

100.00 

8, 9 

96.67 

83.33 

86.67 

0.00 

6.67 

10, 11 

100.00 

83.33 

100.00 

20.00 

20.00 

12 

83.33 

86.67 

93.33 

0.00 

40.00 


83.33 

94.44 

72.22 

77.78 

0.00 

11.11 


93.33 

93.33 

93.33 

73.33 

53.33 
0.00 


86.67 
60.00 

66.67 
20.00 
40.00 

0.00 


100.00 

93.33 

100.00 

93.33 

86.67 

93.33 

0.00 


Accuracy 


Overall Average 


100.00 100.00 
92.26 91.48 

91.67 90.12 

83.33 
64.88 
61.90 
43.45 






Table 6.18 Results of ML Classification over Test Samples for Various Sets of Input Bands. 



Percent Agreement with Ground Truth Map 


Class Index (No. of Pixels per Class) 

Accuracy 

Input 

1 

2 

3 

4 

5 

6 

7 

8 

9 

Overall 

Average 

Bands 

(3489) 

(6454) 

(593) 

(398) 

(602) 

(936) 

(412) 

(361) 

(115) 



1 to 12 

99.08 

97.92 

87.02 

42.71 

68.94 

90.81 

19.90 

66.20 

91.30 

90.97 

73.77 

1 to 7 

89.45 

72.89 

91.57 

41.21 

80.56 

67.09 

38.59 

43.49 

71.30 

75.17 

66.23 

8 to 12 

96.70 

91.56 

99.16 

40.70 

95.51 

71.47 

74.21 

54.85 

97.39 

89.02 

80.18 

8 to 11 

96.10 

73.27 

97.64 

33.92 

91.86 

63.25 

72.33 

54.29 

94.78 

78.92 

75.27 

8,9 

90.51 

82.00 

86.68 

0.00 

10.80 

66.35 

54.13 

15.24 

95.65 

75.13 

55.70 

10, 11 

93.24 

60.75 

93.76 

10.80 

20.26 

4.70 

56.55 

26.87 

95.65 

62.72 

51.40 

12 

81.11 

84.13 

90.21 

0.00 

34.72 

37.07 

0.00 

0.00 

0.00 

69.99 

36.36 













Table 6 19 Results of MSD Classification over Training Samples. 


Percent Agreement with Ground Truth Map 


Class Index (No. of Pixels per Class) 


Input 

Sources 

1 

(30) 

2 

(30) 

3 

(15) 

4 

(15) 

5 

(15) 

6 

(18) 

7 

(15) 

1,2 

100.00 

100.00 

100.00 

86.67 

100.00 

94.44 

100.00 

1 , 2c, 2d 

100.00 

100.00 

100.00 

80.00 

100.00 

94.44 

100.00 

1,2a,2b,2d 

100.00 

96.67 

100.00 

73.33 

100.00 

88.89 

100.00 


93.33 


Accuracy 


Overall Average 


98.21 

97.62 

95.24 



94.69 


Table 6.20 Results of MSD Classification over Test Samples. 



Input 

Sources 


(3489) (6454) 


97.70 

96.85 

96.96 


95.51 

91.78 

91.74 


Percent Agreement with Ground Truth Map 


Class Index (No. of Pixels per Class) 


14 5 6 7 8 

13) (398) (602) (936) (412) (361) 


[.12 55.78 96.68 84.51 70.87 82.27 


3 

(593) 


96.12 

95.62 

95.28 


55.78 

47.74 

38.44 


5 

(602) 


96.68 

96.51 

93.36 


6 

(936) 


84.51 

76.39 

75.64 


63.11 

57.28 


8 

(361) 


82.27 

82.27 

85.04 


9 

(115) 


97.39 

93.91 

95.05 


Accuracy 

Overall 

Average 

93.08 

86.31 

89.97 

82.69 

89.41 

81.01 
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considerable increase in the average classification accuracy of the MSD 
classification for the test samples as compared to the ML classification 
accuracy. It is expected because the MSD classification classifies pixels based 
on the assessment of multiple sources instead of a single source. This is a 
major advantage of the MSD classification over any single source data 
classification. While the ML classification based on the stacked vector 
approach combines the features in the raw data level and buries their relative 
reliabilities in the statistical correlation information, the MSD classification 
combines the multiple groups of the features after assessing the individual 
groups with explicit consideration of their relative reliabilities. 

In order to demonstrate the Hughes phenomenon, the ML classification 
over the test samples was performed with various numbers of the best features 
as determined by feature selection using both the J-M distance and the 
Transformed Divergence. The result of the feature selection was, from best to 
worst: 8, 12, 11, 1 0, 9, 7, 6, 4, 5, 3, 2, and 1 . As shown in Figure 6. 1 5, the ML 
method gave the highest accuracies at 8 features (8, 12, 1 1, 10, 9, 7, 6, 4). 

However, the MSD classification based on the proposed method was 
able to utilize all features when applied to a “multisource” data set consisting of 
two “sources”: one having the 8 best features and the other having the 
remaining 4 features. The first 4 lines in Table 6.21 are the results of 
classification with various degrees of reliability applied to the second source. 

Another set of multisource data was formed by dividing the features into 
two groups each of which has roughly equally good features. The classification 
result from applying the proposed method to this data set is shown in the last 
line of Table 6.21. In this particular case, although the dependencies between 
sources were ignored, the accuracies were the highest. This is due to the 
reinforcing characteristic of Dempster’s rule, which means that the combined 
body of evidence provides stronger support than any individual body of 
evidence. 
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Figure 6.15 Results of ML Classification over Test Samples 
for Various Numbers of Features 


Table 6.21 Results of MSD Classification for Data Set 
formed by Feature Selection. 


Bands in Source 1 
(Source Reliability) 

Bands in Source 2 
(Source Reliability) 

Overall 

Average 

8 12 11 109 764 (1.0) 

(1.0) 

(1.0) 

(1.0) 

532 1 (1.0) 

(0.9) 

(0.8) 

(0.7) 

95.27 

96.07 

96.65 

96.81 

89.29 

90.42 

90.07 

89.96 

8 11 96 5 2 (1.0) 

12 10 743 1 (1.0) 

96.89 

91.13 
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6.4. Classification of HIRIS Data 

The High Resolution Imaging Spectrometer(HIRIS) is an Earth Observing 
System (EOS) sensor developed for high spatial and high spectral resolution. It 
can provide more information in the 0.4 to 2.5/im spectral region than any other 
earth-observing sensor. Table 6.22 compares some of the attributes of HIRIS 
and early Earth satellite observing sensors. [Goetz and Herring (1989)] 

The high dimensionality of HIRIS data causes several difficulties in 
classifying the data. In addition to the high computational cost of classifying 
such data, a huge amount of training samples is necessary in order to have 
accurate estimation of the statistical parameters using all 192 channels. 
Furthermore, unless these parameters can be accurately estimated, it is even 
impossible to use statistical feature selection techniques to reduce the 
dimensionality. 

In this section, the proposed method is applied to the classification of 
HIRIS data by decomposing the data into smaller pieces, i.e., subsets of 


Table 6.22 Comparisons of MSS, Thematic Mapper (TM) and HIRIS. 



MSS 

TM 

HIRIS 

No. of Spectral Bands 

4 

7 

192 

IFOV(ground) 

79m 

30/1 20m 

30m 

Dynamic Range 

6/7 bits 

8 bits 

12 bits 

Swath Width 

185/cm 

185/cm 

30 km 

Data Rate 

7.63Mbits/sec 

67.4Mbits/sec 

300Mbits/sec 

Spectral Region 

0.5 - l.l^m 

0.45-0.90/rm 
1.55-1.75/m) 
2.08-2. 35jum 
10.4-12.5/m) 

0.4-2.5/im 

Spectral Resolution 

0.1-0.3/^m 

0.6-2.2 7fim 

0.01/xm 
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spectral bands. The data set used in this experiment is simulated HIRIS data 
obtained by RSSIM [Kerekes and Landgrebe (1989)]. RSSIM is a simulation 
tool for the study of multispectral remotely sensed images and associated 
system parameters. It creates realistic multispectral images based on detailed 
models of the ground surface, the atmosphere, and the sensor. Table 6.23 
provides a description of the simulated HIRIS data set. 

Figure 6.16 is a visual representation of the global statistical correlation 
coefficient matrix of the data. The image is produced by converting the absolute 
values of coefficients to gray values between 0 and 255. Based on the 
correlation image, the 201 bands were divided into 3 groups in such a way that 
intra-correlation is maximized and inter-correlation is minimized. Table 6.24 
describes the multisource data set after division. Note that the spectral regions 
of the input channels in Source 3 coincide with the water absorption bands. 

With 225 training samples (a third of the total samples) for each class, the 
ML classification and the multisource data classification using the consonant 
belief function and the maximum plausibility decision rule were performed over 
the total samples for various sets of the sources, and the results are listed in 
Tables 6.25 and 6.26. In the multisource data classification for Source 1 and 
Source 2, first the sources were given the equal reliability and then Source 2 
was discounted with degree of reliability 0.9 to show the effect of varying 
degrees of reliability on the classification accuracy. 


Table 6.23 Description of Simulated HIRIS Data Set. 


Name 

Finney County Data Set 

Data Type 

201 -band HIRIS data simulated by RSSIM 

Spectral Region 

0.4 - 2Afim 

Spectral Resolution 

0.01/m? 

Image Size 

45 lines x 45 columns (2025 samples) 

Information Classes 

Winter Wheat, Summer Fallow, Unknown 
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Figure 6.16 Global Statistical Correlation Coefficient Image 
of Finney County Data Set. 
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The results of the ML method apparently show effects of the Hughes 
phenomenon; the accuracy goes down as the dimensionality of the source 
increases while the number of training samples is fixed. In particular, the 
accuracy decreases by a considerable amount when all features are used. 
Presence of the Hughes phenomenon causes the ML method to be particularly 
sensitive to a bad source, Source 3 in this case. Meanwhile, the proposed 
MSD classification method always shows robust performance and gives 
consistent results. 

To explore how to handle a situation in which the training samples were 
too limited to permit use of all available features, both methods were run again 
with 68 training samples (10% of the total samples), and the results are shown 
in Table 6.27. In this case, the features were selected with a uniform spectral 
interval from Source 1 and Source 2, excluding the features in Source 3. The 
table shows the number of features actually used for the subdivided sources. 
Four cases were run, each with a different spectral interval, resulting in a total of 
51, 40, 31, and 20 features, respectively. The proposed method performed 
better in all four cases than did the ML method. 


Table 6.24 Divided Sources of HIRIS Data Set. 


Source Index 

Input Channels 

No. of Features 

Source 1 

1-35, 107- 141, 157-201 

115 

Source 2 

36 - 95 

60 

Source 3 

96- 106 (1.35- 1.45pm) 
142 - 156 (1.81 - 1.95pm) 

26 
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Table 6.25 Results of ML Classification with 225 Training Samples. 


Source 

SI 

S2 

S3 

SI, S2 

All 

Classification 
Accuracy (%) 

75.75 

75.60 

45.83 

74.56 

65.14 


Table 6.26 Results of Multisource Data Classification 
with 225 Training Samples. 


Reliability of 

Classification 
Accuracy (%) 

SI 

S2 

S3 

1.0 

1.0 

1.0 

77.63 

1.0 

1.0 

not used 

77.83 

1.0 

0.9 

not used 

78.32 


Table 6.27 Results of Classifications with 68 Training Samples. 



Classification Accuracy (%) 



Sources 

SI 

S2 

SI 

S2 

SI 

S2 

SI 

S2 

# Features 

33 

18 

27 

13 

21 

10 

14 

6 1 

ML 

77.43 

82.40 

82.86 

81.82 

MSDC 

82.22 

84.10 

85.04 

81.90 
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6.5. Discussion 

In this chapter, the Evidential Reasoning (ER) multisource data 
classification method presented in Chapters 3, 4, and 5 has been applied to the 
ground-cover classification of various multisource data sets. Once it is 
determined which belief function and decision rule will be used, the 
implementation of the method is as easy as implementing a typical ML method. 

The first experiment, with the multisource data set consisting of 3 multi- 
channel data sources and 3 topographic data sources, was intended to assess 
the ability of the ER method in capturing and utilizing the information obtained 
from the topographic data sources as well as the multispectral data sources. In 
this particular experiment, some of the classes could not be assumed to be 
normally distributed in the topographic data. Thus, in the MSD classification 
based on the ER method, the nonparametric Nearest Neighbor method was 
adopted to compute the likelihood functions of test samples, which were then 
used to construct the IV belief functions for the bodies of evidence provided by 
the topographic data sources. By treating the multiple data sources separately, 
the proposed method was able not only to utilize nonparametric information 
together with parametric information but also to incorporate various degrees of 
source reliability into the process. The method provides more than one choice 
for representation of statistical evidence and a decision rule; these choices give 
a lot of flexibility to the multisource data analysis. At this point in the research it 
is not known exactly which choices should be made in general; the choices 
must depend on our knowledge concerning the context of the specific problem, 
such as the hierarchical structure of information classes and the amount and 
reliability of available information. 

The ER method was also applied to the classification of two single- 
source data sets: 12-band A/B MSS data, and 201 -band simulated HIRIS data. 
Both experiments were designed to observe how effectively the proposed 
method utilizes the available features and overcomes the Hughes phenomenon 
when the number of training samples is small. From single-source data a 
multisource data set was formed by decomposing the high-dimensional data 
into smaller and more manageable pieces based on the global statistical 
correlation information. 
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In the experimental results (or the 12-band A/B MSS data, two 
observations were made: (1) the classification accuracy of the MS 
classifications decreased as the set of bands was more finely subdivided, and 
(2) the average classification accuracy of the MSD classification increased 
significantly compared to the ML classification accuracy. According to the first 
observation, inter-channel statistical correlation must be kept within e 
subdivided sources (consistent with the independence assumption of 
Dempster’s combination rule). Similar results were observed when the 
classification was performed for the set of features subdivided based on feature 
selection. Although dependencies between sources were ignored, the 
classification accuracy was increased due to the reinforcing characteristic of 

Dempster's rule. 

The experimental results for the 201 -band simulated HIRIS data showed 
that the MSD classification provided robust and consistent performance despite 
the existence of an inconsistent source when training samples were very 
limited. The information obtained from an inadequate number of training 
samples is considered to be inexact and incomplete. The results 
demonstrated the ability of the ER method to capture uncertain information 
based on inexact and incomplete bodies of evidence, and consequently to 
utilize features as effectively as possible. 
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CHAPTER 7 

CONCLUSIONS AND SUGGESTIONS FOR FURTHER RESEARCH 


7.1 Conclusions 

The problem of drawing inferences using subjective probability 
measures is not a trivial one, especially when it involves multiple information 
sources associated with various degrees of relative reliability. In this report we 
have investigated how interval-valued probabilities can be used to represent 
and integrate evidential information obtained from various data sources. 

IV probability is a generalization of the conventional point-valued 
probability. It has been known as a more adequate scheme than the 
conventional additive probabilities for representing partial information provided 
by inexact and incomplete sources. Chapter 2 reviewed various systems of IV 
probabilities and introduced an axiomatic approach to IV probabilities. In the 
axiomatic approach the upper and the lower probabilities are given by a pair of 
set-theoretic functions. 

One of the basic problems in applying IV probabilities to a real-world 
problem is how to infer the upper and the lower probability functions given a 
body of evidence. Chapter 3 investigated formal methods of constructing IV 
probability functions when the given body of evidence is based on the 
outcomes of statistical experiments governed by a probability model. This 
report has mainly focused on the two IV belief functions, the consonant and the 
partially consonant belief functions, which are based on the Likelihood 
Principle. Even though they require certain assumptions which are not difficult 
to satisfy in practice, they have mathematically simple and readily usable 
formulas. In order to include the relative reliabilities of sources in a multisource 
data analysis, the attempts to represent quantitatively the degree of reliability by 
the average Jeffries-Matusita distance, the average Transformed Divergence, 
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and the average measures of conflict between pairs of sources were made. 
These measures were used to rank the multiple sources according to the 
relative reliabilities of the sources. 

In the analysis of multiple data sources, a combination rule is an 
essential tool in order to base inferences and decisions on all available 
information. Chapter 4 formally stated desirable properties of combination rules 
and investigated the inferencing mechanisms of the subjective Bayesian 
updating rules and Dempster’s rule for combining multiple bodies of evidence. 
It was also noted that Dempster's rule is a generalized form of Bayesian 
inference, which is characteristically reinforcing and robust to small variations in 
probability measures to be combined. The robustness of Dempster’s rule was 
analyzed in the aspect of its differential behavior according to slight changes of 
initial belief measures. 

Chapter 5 presented an account of basic elements in the decision theory 
for pattern recognition based on IV probabilities and developed the absolute 
rule and the Bayes-like decision rule for evidential intervals on the basis of the 
general interval-valued expectation. A problem with these rules is that there 
may happen ambiguous situations where decisions cannot be made. The 
minimum average expected loss rule was proposed to resolve such ambiguous 
situations. Further, the minimum upper expected loss rule and the minimum 
lower expected loss rule were proposed as alternatives to the previous two 
rules. 

Overall concepts of interval-valued probabilities have been implemented 
and evaluated as a new method for classification of multisource data in remote 
sensing. As described in Chapter 6, the proposed method was applied to three 
separate sets of multisource data, one consisting of three multi-channel data 
sources and three topographic data sources, and two consisting of single- 
source multispectral data. The purpose of applying the method to the single- 
source data sets was to utilize as many features as effectively as possible 
(when training samples are limited) by decomposing a large number of 
channels into smaller and more manageable subsets based on the global 
statistical correlation. 

In the method each data source is considered as a body of evidence 
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providing partial information. When the body of evidence is represented by IV 
probabilities, the width of the interval represents the uncertainty associated with 
the corresponding source. The method combines the individual bodies of 
evidence into the total body of evidence. By treating the data sources 
separately, the method is not only able to utilize both parametric and 
nonparametric information but also able to incorporate various degrees of 
source reliability in the multisource data analysis. 

The experimental results showed that compared to the conventional ML 
classification, the proposed method gave higher and more robust classification 
accuracies for test samples even when a far less reliable source was included 
in the data set. The increase in average classification accuracy was 
noteworthy. The results also showed that the classification accuracies could be 
increased by varying the degree of reliability assigned to each source as well 
as by choosing an appropriate decision rule. 

The most important feature of the method is the capability of plausible 
reasoning under uncertainty in pattern recognition, especially where multiple 
data sources are not 100% reliable or provide conflicting information. The 
method of classification for multisource data based on IV probabilities can also 
be used to good advantage when there are only small numbers of training 
samples and reliable estimation of statistical information requires dividing the 
high-dimensional data into lower-dimensional subsets. 


7.2 Suggestions for Further Research 

The Evidential Reasoning method developed in this work could be 
further improved in the following respects: 

(1) Computational complexity: It is apparent that the processing time will 
increase as the number of sources increases. Furthermore, since Dempster's 
rule computes the IV probability of a subset AcO as the sum of the basic 
probability assignments of A and all the subsets of A, the computational 
complexity grows exponentially with the number of elements in Q. A possible 
way to reduce the computation is to restrict the number of focal elements to be 
considered. In a remote sensing context, this is possible by designing the 



classes hierarchically. 


(2) Generalization of the minimum average loss rule: Although the minimum 
upper expected loss rule (maximum plausibility rule) produced the best results 
in the experiments, it is considered to be due to the belief functions used. In 
general, the minimum average loss rule is considered to be more reliable than 
any other rule because it includes both the upper and the lower probabilities. 
This rule may be generalized by considering the IV expected loss as a convex 
set of measures. 
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